RTP Over Tls

Georg-August-Universitt Gttingen Zentrum fr Informatik
ISSN 1612-6793 Nummer GAUG-ZFI-BM-2007-28
Masterarbeit
im Studiengang "Angewandte Informatik"
RTP over Datagram TLS
John-Patrick Wowra
Computer Networks Group
Bachelor- und Masterarbeiten des Zentrums fr Informatik an der Georg-August-Universitt Gttingen 17. September 2007
Georg-August-Universitt Gttingen Zentrum fr Informatik Lotzestrae 16-18 37083 Gttingen Germany Tel. Fax Email +49 (5 51) 39-1 44 14 +49 (5 51) 39-1 44 15 ofce@informatik.uni-goettingen.de
WWW www.informatik.uni-goettingen.de
Ich erklre hiermit, dass ich die vorliegende Arbeit selbstndig verfasst und keine anderen als die angegebenen Quellen und Hilfsmittel verwendet habe. Gttingen, den 17. September 2007
Masterarbeit
RTP over Datagram TLS

John-Patrick Wowra 17. September 2007
Betreut durch Prof. Dr. Fu Computer Networks Group Georg-August-Universitt Gttingen
Acknowledgement I would like to acknowledge my advisor Prof. Dr. Xiaoming Fu for excellent guidance, motivation and encouragement, my parents and Kate ina for their support and Christian r Dickmann for his patience and helpfulness.
Abstract
The popularity of Internet Telephony has been rising continuously in recent years. With a rising number of users inevitably the number of malicious users rises as well. Hence security is a major concern for Internet Telephony. Commonly RTP is used with Internet Telephony for transmission and reception of audio and video data. Traditionally, RTP runs over UDP, and RTP trafc is in most cases transmitted without any protection. Datagram TLS is a modied version of TLS that functions properly over datagram transport. This thesis studies an RTP extension based on DTLS, and includes conduction of a prototype implementation and further analysis of the design towards securing RTP and thus Internet Telephony.
Contents
1 Introduction 8
1.1 1.2 1.3

2
Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Thesis Organisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Voice over IP . . . . . . . . . . Real Time Transport Protocol SSL/TLS and DTLS . . . . . . Session Initiation Protocol SIP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8 10 11
12
Background
2.1 2.2 2.3 2.4

3
12 14 17 22
28
Related Work
3.1
3.2
4
Security in VoIP . . . . . . . . . . . . . . . . . 3.1.1 Internet Protocol Security, IPsec . . . . 3.1.2 Comparison between IPsec and DTLS Secure Real Time TransportProtocol . . . . . Introduction . . . . . . . . . . 4.1.1 Condentiality in VoIP 4.1.2 Availability in VoIP . . Threats and Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
28 29 32 33
35
Security Considerations for VoIP
4.1
4.2
5
35 36 37 37
39
RTP over DTLS
5.1
Introduction to RTP over DTLS . 5.1.1 SRTP Compatibility Mode 5.1.2 Packet size Comparison . 5.1.3 Security Considerations .
39 40 41 41
42
Implementation Design
6.1 6.2
Analysis of Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . System Idea/Intent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.1 DTLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
42 43 43
Contents
6.3 6.4
6.2.2 RTP . . . . . . . . . 6.2.3 SIP Softphone . . . RTP over DTLS . . . . . . Choice of Libraries . . . . 6.4.1 OpenSSL . . . . . . 6.4.2 CCRTP . . . . . . . 6.4.3 Twinkle Softphone
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .
43 44 44 46 46 47 48
49
Design Details
7.1
7.2 7.3 7.4 7.5

8
Design Components: RTP - ccRTP, DTLS - OpenSSL and SIP - Twinkle 7.1.1 OpenSSL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.2 Socket Initialisation . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.3 Session Initialisation with ccRTP . . . . . . . . . . . . . . . . . . 7.1.4 Sending Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.5 Receiving Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.6 Closing Sessions . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.7 Types of Sessions . . . . . . . . . . . . . . . . . . . . . . . . . . . SIP Session Initiation with Twinkle . . . . . . . . . . . . . . . . . . . . . Implementation Process . . . . . . . . . . . . . . . . . . . . . . . . . . . Class Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Problems and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . Testing Methodology . . . . . . . Testbed Setup . . . . . . . . . . . Measurement Methods and Tools Results . . . . . . . . . . . . . . . Standard RTP Packet Delay . . . RTP over DTLS Packet Delay . . CPU Usage . . . . . . . . . . . . . Test Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
49 49 50 50 51 51 52 52 52 53 55 56
57
Testing
8.1 8.2 8.3 8.4 8.5 8.6 8.7 8.8

9
57 58 58 59 61 63 64 64
66
Conclusion and Future Work
9.1 9.2
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Future Work and Open Issues . . . . . . . . . . . . . . . . . . . . . . . . . . .
66 67
69
Bibliography
List of Figures
2.1 2.2 2.3 2.4 2.5 2.6 3.1 3.2 3.3 5.1 7.1 7.2 7.3 7.4 8.1 8.2 8.3 Strukture of an RTP packet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Schematic representation of the SSL handshake protocol with two way authentication with certicates [1]. . . . . . . . . . . . . . . . . . . . . . . . . . . DTLS in the TCP/IP stack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DTLS packet struckture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DTLS state machine [2] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Initialisation of a SIP session . . . . . . . . . . . . . . . . . . . . . . . . . . . . IPsec in the TCP/IP stack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Structure of an IPsec packet with AH . . . . . . . . . . . . . . . . . . . . . . . Structure of an IPsec packet with ESP . . . . . . . . . . . . . . . . . . . . . . Struckture of an RTP packet sent over DTLS . . . . . . . . . . . . . . . . . . . Implementation status after phase 1 Implementation status after phase 2 Implementation status after phase 3 RTP over DTLS class structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 18 19 20 21 25 30 31 31 40 53 54 54 55 59 61 63
Testbed for RTP over DTLS tests . . . . . . . . . . . . . . . . . . . . . . . . . Delay for normal RTP packets . . . . . . . . . . . . . . . . . . . . . . . . . . . Delay for RTP over DTLS packets . . . . . . . . . . . . . . . . . . . . . . . . .
1 Introduction
1.1 Motivation
Today enterprises have to maintain two networks in order to use the services of Internet and Telephone. But traditional landline phones as we all know them are bit by bit replaced with new Internet Phones for their advantages. Internet Telephony is the routing of voice information over the Internet (or other IP based networks). The telephone calls are handled by protocols which are commonly referred to as Voice over Internet Protocol (VoIP). VoIP technology provides a wide range of services to users. As an additional feature VoIP offers for example video calls. VoIP calls are also cheaper than traditional phone calls; calls between two VoIP participants are even free. Enterprises with branches in different cities that are connected by a VPN might use VoIP technology for internal communication between the branches and can thereby reduce costs signicantly. Beside the reduction of costs for calls, the infrastructure has become more exible because VoIP technology provides open platforms in contrast to traditional Telephony. In traditional Telephony networks standards were only known to a small circle of developers at the network provider. Nowadays with VoIP the protocols, software and tools can be improved and adjusted to the needs of the users not only by their developers. The total number of VoIP users has been rising continuously over the past years. With a rising number of users inevitably the number of malicious users rises as well. Hence
1 Introduction
security is a major concern for Internet Telephony. Since VoIP is based on IP [3], it is vulnerable to all of the attacks that can plague traditional IP networks, like packet snooping, unauthorised access, spoong and especially denial of service attacks. Usually a conversation over a traditional phone is established over the communications providers network. All companies involved in the connection are known and have to be trusted. With VoIP data is transmitted through a lot of networks where not all providers are known. Anyone with access to a machine along the path of communication could access the transmitted data. Therefore VoIP calls are more vulnerable to eavesdropping than landline telephones. However this is a known problem from other applications transmitting condential data over an insecure network such as the Internet. Cryptographic protocols can be used to secure data from being eavesdropped or altered. A well known and as reliable considered security protocol is Secure Sockets Layer/Transport Layer Security (SSL/TLS) [4]. SSL/TLS residues above Transport Layer and commonly uses the Transmission Control Protocol (TCP) [5] or alike. TCP is a reliable and connection oriented protocol with mechanisms for buffering and retransmission. Thereby it is assured that the received data is exactly the same as the data transmitted. This is however not the primary desired feature in VoIP. The problem hereby is the buffering and retransmission mechanisms. The data is sent with unreliable IP protocol. Hereby packets might not arrive in the order they were transmitted or they can get lost on the way. TCP reassembles the packets to the right order and waits for lost packets to be retransmitted to reassemble the data as it was sent. This is very useful for services like e-mail where the received data is desired to be the same as the data transmitted, but in a VoIP a stream of data is played continuously to the receiver and a delay caused by retransmission results in a pause of the media stream. A delay in VoIP is dened as the time the voice takes on its way from the mouth of the speaker to the ear of the listener. It is the sum of time needed to digitalise the voice to audio data, fragment the stream of audio data to data packets and transmit the data to
1 Introduction
the destination. Delays are commonly known in traditional telephony. For instance long distance phone calls used to have quite long delays until the spoken word is received on the other side. These delays make a uent conversation like in a face to face conversation impossible. Therefore in VoIP trafc the highest priority is not the exact transmission of the data, instead the data needs to be transmitted to the receiver as fast as possible to reduce the delays caused by the transmission over the Internet. Thus VoIP protocols such as the Real Time Transport Protocol (RTP) [6] rely on connectionless transmission using the User Datagram Protocol (UDP) [7]. UDP has no mechanism for retransmission of lost data packets. Hence in RTP lost, damaged or late packets are discarded and the media stream is played continuously. In case a packet gets lost the next received packet will be played immediately and as long as the amount of lost packets does not exceed a certain amount, the receiver does not even notice that packets are lost. With the goal of securing real time media such as VoIP, TLS was enhanced in order to work with UDP datagrams. This advancement of TLS is called Datagram Transport Layer Security (DTLS) [8] and it was standardised in spring 2006 by the Internet Engineering Task Force (IETF1 ). In the same time the IETF published an Internet Draft on RTP over DTLS [9]. The core of this thesis is the design, implementation and test of a prototype implementation of RTP over DTLS.
1.2 Goals
Goal of this thesis is taking part in the design of a unied media security framework for Internet Telephony, using RTP and DTLS. The focus herby lies in the interaction of RTP and DTLS components of the framework. A critical aspect in terms of efciency of the implementation framework is the packet loss. Packet loss in media streaming occurs, when data packets do not arrive within a time limit to be inserted into the data stream any more.
1 http://www.ietf.org/
10
1 Introduction
The critical aspect hereby is the delay, the sum of time it takes to transmit voice data from caller to callee. The recommended threshold for delays in Telephony is 150 milliseconds according to the International Telecommunication Union Standardisation Sector (ITU-T2 ). For Telephony a packet loss rate of up to 5% is still acceptable according to the ITU-T. Therefore the implementation of RTP over DTLS shall provide a packet loss rate lower than 5%. A prototype of a VoIP application using RTP over DTLS was implemented in order to determine whether RTP trafc can be effectively transmitted over DTLS without compensation of the quality of the call. This prototype was developed based on existing implementations of RTP and DTLS. Technical premises and detailed requirements to this implementation needed to be analysed to lead to an adequate approach. The prototype was tested for the critical aspects in order to determine the usability of the approach in order to lay path to further development of the unied media security framework.
1.3 Thesis Organisation

This thesis is organised as follows: It starts with an overview on basic VoIP components in the second Chapter as an introduction and for better understanding of this thesis. Chapter 3 presents related work and a discussion about alternative concepts of how Internet Telephony can be secured. Chapter 4 provides security considerations and an overview of possible attacks in VoIP trafc. Chapter 5 presents the general concept of RTP over DTLS. Chapter 6 deals with the implementation design and the choice of the libraries to use. In Chapter 7 the implementation process is described along with problems and a detailed description how the libraries work together. In Chapter 8 the prototype is tested as an evaluation of the approach. Chapter 9 deals with open issues and future work and contains the conclusion.
2 http://www.itu.int/ITU-T/
11
2 Background
This chapter describes the basic concepts which are necessary to understand this thesis. An introduction to Internet Telephony is given along with a description of the main protocols used in this thesis. Due to space limitations the level of detail is kept moderate and interested reader are suggested to follow the references of this thesis.
2.1 Voice over IP

VoIP also called IP Telephony, Internet telephony, Broadband telephony, Broadband Phone and Voice over Broadband is the routing of voice conversations over the Internet or any other IP-based network. First transmissions of digitalised audio data from one computer to another were achieved in 1973 in the Advanced Research Projects Agency Network (ARPANET1 ) with a throughput of 3.490 bit/s [10]. A VoIP call is established in a similar way like a traditional phone call. There are three general phases: Connection Initiation, Transmission of Voice and Connection Termination. The initiation and termination are done over a signalling protocol. A common signalling protocol today is the Session Initiation Protocol (SIP) [11] which is presented in detail in an upcoming section, H323 [12], IAX (InterAsterisk eXchange) [10] and Skype2 [13]. To initiate a VoIP call with SIP the caller invites the callee to a so called session. In or1 http://www.darpa.mil/ 2 http://www.skype.com
12
2 Background
der to establish a connection, the caller needs to know where and whether the callee is available. Subscribers of a SIP provider have a so called SIP-Uniform Resource Identier (SIP-URI). These addresses are similar to e-mail addresses in the URI format (e.g. sip:username@example.com). Before any user can call another user or receive a call, the terminal device must register to the central server of the SIP provider and thereby inform their provider that they are online and ready to receive calls. The server has now information about the location of the logged user, thereby the user is reachable through the server to other SIP users. For connection initiation the caller sends an invite message to the server, which will be forwarded to the callee, whose terminal device will be ringing then. Upon acknowledgement of the call an accept message is send back to the caller along with the current IP address of the callee. The servers are not needed for the session anymore because the session is intiated. The media channel is now established directly between the participants with RTP. A detailed description of a VoIP call initiation using SIP is provided in the upcoming SIP section. It is generally (e.g. with SIP) possible to establish a connection directly between caller and callee without servers, but then the IP address of the callee must be known to the caller. This is somewhat impractical as we know from telephone numbers and from the Internet. Nobody remembers a website by its IP address but by its domain name. A name is a much better association to a person or company and much better rememberable. Furthermore, IP addresses are dependent on the users location (e.g. at work and at home). The transport of the audio data is achieved with the Real Time Transport Protocol (RTP) [6], which is presented in detail in an upcoming section. RTP divides the audio data stream into small packets which are then transmitted via IP usually directly from speaker to listener, where an audio stream is generated from the received data packets that is played to the receiver. In enterprises VoIP is used more and more to reduce infrastructure costs since only one network infrastructure is needed instead of two, one for IP and one for Telephony. For en-
13
2 Background
terprises and private users a great benet is the saving of telephone call costs. Calls from VoIP to VoIP are normally free. Enterprises therefore tend to use VoIP for internal communication and traditional Telephony for outbound calls. Connections to landline phones are possible through gateway services (which are provided e.g. by SIP providers) but these connections are usually charged. In order to be reachable through such a gateway by a traditional phone providers offer their customers additionally to their address a landline phone number. The users are similarly to e-mail reachable through the same address or telephone number worldwide regardless of the current residence as long as the user is connected to the Internet. As terminal device a large variety of devices can be used, that can connect to networks (IPphones, cellphones, PCs, PDAs, Analogue Phones with special adapters, ...). Another benet of VoIP is the exibility provided by the open standards. Thereby new services can easily be added to VoIP. With reduction of costs, increased reachability, exibility and additional services like video calls VoIP will play a signicant role in the future of Telephony.
2.2 Real Time Transport Protocol

In VoIP RTP [6] is the commonly used protocol for the transmission of audio data. RTP provides end-to-end delivery services for data with real-time characteristics, such as interactive audio and video or simulation data. Those services include payload type identication, sequence numbering, time stamping and delivery monitoring. Typically RTP runs over UDP in order to achieve timely delivery of the data packets. TCP is not used because of its retransmission mechanism. The reordering of retransmitted packets leads to head of line blocking in the media stream which delays the packets which arrived in time. RTP does not provide any mechanism to ensure timely delivery or provide other quality of service guarantees, but relies on lower-layer services to do so (e.g. NSIS3 [14]).
3 http://tools.ietf.org/wg/nsis/
14
2 Background
VoIP applications using RTP require at least two participants who communicate by transmitting and receiving multimedia (voice and/or video) data to each other. An association among a set of participants communicating with RTP is called an RTP session or conference. A participant may be involved in multiple RTP sessions at the same time. The data transport of RTP is augmented by the RTP Control Protocol (RTCP) [6] to allow monitoring of data delivery in a manner scalable to large multicast networks, and to provide minimal control and identication functionality. RTCP is based on the periodic transmission of control packets to all participants in the session. The primary function is to provide feedback on the quality of the data distribution. In its second function RTCP carries a persistent transport-layer identier for an RTP source called the canonical name, or CNAME. While other ideniers, as the later explained SSRC may change during a session, the CNAME remains the same. It is used to identify a participant during a session. By having each participant send its control packets to all other participants of a session, each can independently observe the number of participants. This number is used to calculate the rate at which the packets are sent. Hereby more users in a session result in less frequent transmission of RTCP packets by each participant. This is necessary because otherwise the RTCP data trafc could take bandwidth from the connection and cycles from the CPU that are needed by the RTP data trafc. To establish an RTP session a pair of ports is reserved one for audio data and the other one for control (RTCP) packets. The audio conferencing application (the so called VoIP-phone) is used by each RTP session participant and sends audio data in small chucks of approximately 20 ms duration. Each chunk of audio is preceded by an RTP header indicating what kind of audio encoding (e.g. PCM, ADPCM or LPC) is contained in each packet, so that senders can change the encoding during a conference. To cope with lost packets and delays the RTP header contains timing information and a sequence number that allow the receivers to reconstruct the timing produced by the source. Hence the audio stream can be played out continuously. Conferences of both, audio and video are realised by transmit-
15
2 Background
Figure 2.1: Strukture of an RTP packet
ting each in a separate RTP session. In case one of the participants of an RTP session has a lower bandwidth connection to the network than the other participants, an RTP-Proxy (or so called mixer) can be used to solve this issue. A mixer is placed in the low bandwidth area; the mixer resynchronises incoming audio packets from multiple sources to a single audio stream. Thereby the audio data can be further compressed by using a different codec to enable the user with the low bandwidth connection to receive packets from multiple sources. Mixers can be used as well to compose a single video stream as a composition of multiple sources to a group scene of the participating users. The source of the stream of RTP packets is identied by a numeric value in the header of RTP packets. This 32-bit numeric value is called Synchronisation Source (SSRC) identier. Therefore it is independent upon the network address. Since all packets from an SSRC form part of the same timing and number space, a receiver can group packets by the SSRC for playback. The outgoing RTP packets of the mixer are then identied by the mixers SSRC value. The structure of an RTP packet is shown in gure 2.1 on page 16. The RTP Payload consists of the media that is being transmitted. The RTP Header contains information related to the payload e.g. the source, encoding etc. The RTP packet is then wrapped in a UDP packet which is encapsulated in an IP packet to be transferred over an IP based network.
16
2 Background
2.3 SSL/TLS and DTLS

The rst versions of SSL were developed by Netscape4 as a security protocol for Internet trafc with the Netscape Internet Browser. Netscapes competitor Microsoft5 developed its own Security Protocol, the Private Communications Technology (PCT) which was derived from the second version of SSL. The IETF chartered the Transport Layer Security (TLS) working group to try to standardise an SSL like protocol in May 1996 to harmonise the different approaches with the result that SSL was from then on enhanced under the name TLS. Today TLS is a widely deployed protocol for securing network trafc. It is currently used for protecting Internet trafc (e.g. Internet Banking) with the Hyper Text Transfer Protocol Secure (HTTPS) [15] and for e-mail protocols. It provides a secure channel to applications with three primary security features: Authentication of the server Condentiality of the communication channel Message integrity of the communication channel Optionally TLS can provide authentication of the client. Public key based digital signatures are used which are backed by certicates. The server authenticates by decrypting a secret encrypted under his public key or by signing a random challenge. The TLS handshake is a conventional two round trip algorithm negotiation and key establishment protocol. Hereby the most common variant is RSA based handshake [16]. Figure 2.2 on page 18 presents the handshake which can be divided into four phases: In phase 1 the client sends a client_hello to the server who responds with a server_hello. In these messages the latest supported protocol version is transmitted to negotiate the version to be
4 http://www.netscape.com 5 http://www.microsoft.com
17
2 Background
Figure 2.2: Schematic representation of the SSL handshake protocol with two way authentication with certicates [1].
18
2 Background
Figure 2.3: DTLS in the TCP/IP stack
used, a 32 bit random number upon which the pre-master secret will be generated, the Session Identier (Session ID) and the cipher suite to use. Phase two and three are optional, in phase two the server identies himself with a certicate to the client. The client identies himself to the server in case a certicate is available. Additionally the client veries the server certicate which contains the public key of the server. If the certicate cannot be veried, the connection is closed. The handshake is nished in phase four with the generation of the Master Secret, a single use symmetric key that is used during the connection for en-/ and decryption of messages. From now all messages will be transmitted encrypted. With the rising popularity of VoIP and other multimedia services it became necessary to use TLS as well with the faster UDP protocol. TLS itself could not be used directly, because after a packet loss the following data packets cannot be decrypted anymore. Datagram Transport Layer Security (DTLS) [8], which was standardised in April 2006, is a datagram capable version of TLS; therefore it is extremely similar to TLS. The DTLS protocol allows client/server applications to communicate in a way that is designed to prevent eavesdropping, tampering, or message forgery. DTLS reuses almost all the protocol elements of TLS, with minor but important modications for it to work properly with unreliable transport protocols. Figure 2.3 on page 19 shows DTLS in the ve layer TCP/IP protocol stack. DTLS packets have a structure as in gure 2.4 on page 20. In contrast to TLS in the DTLS
19
2 Background
Figure 2.4: DTLS packet struckture
handshake protocol a stateless cookie exchange is used to prevent denial of service. Additionally message fragmentation and re-assembly was added. DTLS handshake messages may be lost, since transmission takes place over datagram transport; therefore DTLS needs a mechanism for retransmission during handshake. This is achieved by incorporating a timer at each end point. Each end-point keeps retransmitting its last message until a reply is received. Furthermore DTLS unlike TLS is vulnerable to two types of denial of Service attacks. The rst attack is a standard resource consumption attack. The second attack is an amplication attack, where the attacker sends a client_hello message apparently sourced by the victim. In order to avoid these attacks, DTLS uses the cookie exchange technique that has been used in protocols such as Photuris [17]. Before the handshake proper begins, the client must replay a cookie provided by the server in order to demonstrate that it is capable of receiving packets at its claimed IP address. The DTLS client_hello message contains a cookie eld, which is empty in case there is no cached cookie from a prior exchange. The message contains the DTLS version, a list of algorithms and compression methods that the client will accept. The server responds with three messages, the server_hello contains the servers choice of version and algorithms. The certicate contains the servers certicate chain. The server_hello_done is a message to inform the Client that the handshake is done. Because of the possibility that DTLS handshake messages get lost, DTLS implements retransmission using a single timer at each endpoint. Each endpoint keeps retransmitting its last message until a reply is received. A state machine implements the timer and resulting retransmissions. Figure 2.5 on page
20
2 Background
Figure 2.5: DTLS state machine [2]
21
2 Background
21 shows this state machine. Once in the Read Message Fragment state, transitions are triggered by the arrival of data fragments or the expiration of the retransmission timer. If a data fragment is the expected next handshake message then the fragment is returned to the higher layers and the timer is revoked. Otherwise, the fragment is buffered or discarded as appropriate and the timer is allowed to continue ticking. When the retransmit timer expires, the implementation retransmits the last messages that it transmitted. DTLS is perfectly predetermined to be used with VoIP because the security of TLS is combined with fast delivery of UDP lling this gap with the existing protocols.
2.4 Session Initiation Protocol SIP

The Session Initiation Protocol is a protocol to enable multi-user communication sessions regardless of media content. SIP is specied by the IETF in RFC 3261 [11]. SIP emerged in the mid-1990s from the research of people among whom some were involved in the specication of RTP. SIP is an application-layer control protocol that can establish, modify and terminate multimedia sessions such as VoIP calls. SIP transparently supports name mapping and redirection services, which supports personal mobility. Thereby SIP provides the basic requirements in communications like: User location User availability User capabilities Session setup Session management User location: SIP determines the location of a user by a registration process. When a VoIP
22
2 Background
phone is activated, it sends out a registration to the SIP server announcing availability to the communications network. User availability: User availability is a method of determining whether a user would be willing to answer a request to communicate. A user can have several locations registered, but might only accept incoming communications on one device. If that is not answered, it transfers to another device or an application, such as voice-mail. User capability: There are many methods and standards of multimedia communications, this method checks for the users capabilities, for example whether a camera for video calls is available or which encryption/decryption methods a user can support. Session setup: SIP establishes the session parameters for both ends of communications, the actual session establishment, when one user calls and another user answers. Session management: This method manages for example the transfer of a call from one device to another (e.g. from a laptop to a mobile-phone and vice versa) without causing a noticeable impact to the communication partner. Another example is the invitation to a third user to a VoIP session and thereby the establishment of a conference call (multiuser session). SIP is not a vertically integrated communications system. SIP is rather a component that can be used with other IETF standardisations, like RTP to build a complete multimedia architecture. An important feature of SIP is that it does not dene the type of session that is being established, only how it should be managed. This exibility means that SIP can be used for a huge number of applications and services. To date, the 3G Community6 has selected SIP as the session control mechanism for the next generation of cellular networks. Microsoft has chosen SIP for its real-time communications strategy and has deployed it in
6 http://www.3gpp.org/
23
2 Background
various products. There are four major components in the SIP architecture: SIP User Agents SIP Registrar Server SIP Proxy Servers SIP Redirect Servers These components deliver messages embedded with the Session Description Protocol (SDP) [18] dening their content and characteristics to complete a SIP session. The terminal devices of SIP are called the SIP User agents (UAs), which can be any kind of a device capability of transmitting voice or other media over a network (e.g. cell-phones, PCs, PDAs,...). These devices are used to create and manage a SIP session. Every User Agent needs a unique identier which is called SIP-URI. SIP addresses use like e-mail addresses the URI format: sip:user@example.com. Another address system are the URLs for Telephone Calls (tel-uri) which are described in [19] where a traditional phone number can be mapped to a SIP address. This is used by gateway servers that many SIP providers maintain in order to enable traditional phone users to call VoIP users. Basically a connection is established, when a User Agent Client (caller) sends an invitation message and the User Agent Server (callee) responds to it. This initiation can be achieved directly (peer-to-peer), in case the current IP address of the User Agent server is known. For the user it is more comfortable to initiate the session with the SIP provider using a SIP-URI. The SIP Registrar Servers are databases that contain the location of all User Agents within a domain. These servers retrieve and send participants messages and other information to the SIP Proxy Server. SIP Proxy Servers accept session requests made by a SIP User Agent and query the SIP Registrar Server to obtain the recipients User Agents addressing information. The SIP
24
2 Background
Figure 2.6: Initialisation of a SIP session
25
2 Background
Proxy Server then forwards the invitation to a session directly to the recipient User Agent if it is located in the same domain or to a Proxy Server if it is located in another domain. The SIP Redirect Servers allow SIP Proxy Servers to redirect SIP session invitations to external domains. The SIP Redirect Server, the SIP Registrar Server and The SIP Proxy Server may reside in the same hardware. Figure 2.6 on page 25 illustrates the establishment of a SIP session between two Internet Service Providers (ISPs). Before any session may be established both users must power their devices and register their availability and their IP addresses with the SIP Proxy Server in the ISPs network in case the connection is established with a SIP provider. User A initiates the call by notifying the Proxy Server in domain A.com a request to communicate with User B. 1. The SIP proxy Server in Domain A recognises that User B is outside its domain upon reception of the request from user A 2. SIP proxy Server A then queries a request for User Bs IP address to the SIP Redirect Server which location can be in Domain A or B. Note that the lookup at the Redirect Server is not SIP queried, it is for instance a DNS lookup. 3. The SIP Redirect Server returns User Bs Proxy Server address. 4. The SIP Proxy Server in Domain A forwards the session initiation request to the SIP Proxy Server in Domain B. 5. The SIP Proxy requests the current IP Address of User B from the Registrar Server in Domain B. 6. The Registrar Server returns User Bs SIP Address. 7. The SIP Proxy relays User As invitation to communicate with User B to User B. This request includes information about the media (audio and/or video). Hereby SDP is used.
26
2 Background
8. User B informs the SIP Proxy that User As invitation is accepted and that he is ready to receive the message. 9. The response from User B is forwarded to User A. Hereby the return path is provided since all servers left their address in a specic eld of the invitation. 10. The response from User B is forwarded to User A. 11. User A and B create a point-to-point RTP connection enabling them to interact.
27
3 Related Work
The following chapter presents related work such as alternative approaches to secure VoIP trafc. The IPsec protocol is presented and compared to the chosen DTLS protocol along with reasons for this choice.
3.1 Security in VoIP

Securing a traditional phone is neither an easy task nor cheap since additional devices would have to be installed to secure the communication channel. Both communication partners would need such a device which results in high additional costs on both sides. It is much easier to implement a security service to a VoIP phone; both communication partners may download and install the software and are thereby able to communicate over a secured channel without much effort. Surprisingly many consumer VoIP solutions do not support any encryption yet. Hence it is not a complex task to eavesdrop on VoIP calls and even change their content [10]. There are some open source solutions that facilitate snifng of VoIP conversations. One example is the Voice Over Miscongured Internet Phones (VOMIT)1 [20] software enables even unprofessional users to easily eavesdrop VoIP calls. The software extracts the audio data from a stream of data that is being transmitted over an insecure network like the Internet. Some vendors use compression to make eavesdropping more difcult. The exist1 http://vomit.xtdnet.nl/
28
3 Related Work
ing secure standard SRTP [21] and the new ZRTP [22] protocol are available on Analogue Telephone Adapters (ATAs) as well as various softphones. Although some devices support SRTP, and thus enabling encrypted VoIP calls, the problem herby is that in standard conguration the keying material is transmitted unencrypted in clear text over the net. Eavesdroppers are thereby able to access the keying material which makes the encryption (almost) useless. Furthermore users need to study the manual to nd out how to enable the secured key sharing [23]. It is possible to use IPsec to secure peer-to-peer VoIP by using opportunistic encryption, which will be presented in the coming section. Skype, a proprietary peer-to-peer Internet Telephony network is closed source, which means that the source code is not published, has over 200 million users worldwide. Skype does not use SRTP, but uses encryption which is transparent to the Skype provider. The user cannot turn encryption on or of, and has to rely on the software and provider. The Voice VPN solution provides secure voice for enterprise VoIP networks by applying Internet Protocol Security (IPsec) [24] encryption to the digititalsed voice stream [10]. IPsec will be explained in the upcoming section as an alternative approach to secure VoIP trafc.
3.1.1 Internet Protocol Security, IPsec

Internet Protocol Security [24] is a suite of protocols for securing Internet Protocol (IP) [3] communications by authenticating and/or encrypting each IP packet in a data stream. Additionally IPsec includes protocols for cryptographic key establishment. IPsec was developed in 1998 as an approach to ll the shortcomings in terms of security of IP. IPsec provides condentiality, authenticity and integrity. The main document to IPsec describes the architecture of the protocol suite, referencing the following RFCs upon which IPsec relies: Authentication Header (AH) [25], Encapsulating Security Payload (ESP) [25] and Internet Key Exchange (IKE) [26].
29
3 Related Work
Figure 3.1: IPsec in the TCP/IP stack
IPsec operates on network layer, therefore it is capable of securing TCP- and UDP-based protocols, which residue on transport layer, as illustrated in gure 3.1 on page 30. IPsec operates in two different modes: transport mode and tunnel mode. In transport mode, only the payload of the data packet is encrypted and/or authenticated. The routing is intact, since the IP header is neither modied nor encrypted. Transport mode is used for peer-to-peer communications. In tunnel mode, the entire packet is encrypted and/or authenticated; therefore it must be packed into a new IP packet for routing to work. The tunnel mode is used for peer-to-peer communication as well as for network-to-network and host-to network connections. The rst thing that needs to be done upon connection initiation is the exchange of the keying material. Hereby the possibly most complex component of IPsec is used, IKE. IKE is using the Dife-Hellman Key Agreement Method [27] for exchange of keys over an insecure network and is based on the Internet Security Association and Key Management Protocol (ISAKMP) [28], the IPsec Domain of Interpretation (DOI) [29] and the Oakley Key Determination protocol [30] and SKEME [31]. Both sides of the connection need to authenticate themselves to the other side and agree to a keying algorithm. The AH guarantees connectionless integrity and the data origin authentication of IP datagramms. It can optionally protect against replay attacks by using the sliding window technique and discarding old packets. AH protects the IP payload and all header elds of an IP datagram except for those that might be changed during transmission. Figure 3.2 on page 31 shows a TCP packet before the AH is inserted and after.
30
3 Related Work
Figure 3.2: Structure of an IPsec packet with AH
Figure 3.3: Structure of an IPsec packet with ESP
The ESP protocol provides origin, authenticity, integrity and condentiality of a packet. Unlike AH, the IP packet header is not protected by ESP. Figure 3.3 on page 31 shows a TCP packet before and after ESP is applied in tunnel mode. The IPsec support is usually implemented in the kernel and the key management is carried out from the user space. However, as there is a standard interface for key management, it is possible to control one kernel IPsec stack using key management tools from a different implementation. IPsec is part of IPv6. It was intended to provide either transport mode or tunnel mode, where packets can be provided to several machines; furthermore it can be used to create Virtual Private Networks [32]. In comparison to TLS IPsec is a peer-to-peer protocol, designed as a generic security mechanism for Internet Protocols. There are a number of problems using IPsec for securing datagram trafc generated by client server applications
31
3 Related Work
which will be discussed in the comparison of IPsec and DTLS in the next section.
3.1.2 Comparison between IPsec and DTLS

In contrast to DTLS, IPsec consists of three protocols: Authentication Header (AH) [33], Encapsulating Security Payload (ESP) [25] and Internet Key Exchange (IKE) [26]. These technologies work together to provide security for IP trafc. The IETF standardised IPsec rst in [28], Internet Security Association and Key Management Protocol (ISAKMP). The architecture of IPsec is described in [24]. The key exchange and parameter management of IPsec is provided by ISAKMP and IKE while data protection is provided by AH and ESP. All these separate developments are connected with security associations (SAs). ISAKMP and IKE are used to establish SAs which are used by AH and ESP to protect the data. The SAs can somehow be compared to a DTLS Session while an SA is only unidirectional in comparison to a DTLS Session. Each SA has a unique 32 bit identication tag which is carried in each packet. IPsec has two methods to establish an SA, manual and automatic keying where automatic keying is similar to the DTLS handshake. The IKE key exchange is based on STS [34], Oakley [30] and SKEME [31]. The two parties exchange Dife-Hellman public keys and use the shared key to derive trafc encryption and message authentication. One big disadvantage of IPsec is the complete failure as soon as a router performs Network Address Translation (NAT). This technology allows a large number of users to use a small amount of IP addresses. Hereby the machines behind a NAT router obtain private IP addresses and the router translates the private address to a public address when the machine connects to a server in the Internet. Since the users private IP address is not known outside the sub network behind the router and IPsec is used to create connections between machines, IPsec cannot establish a connection between the private IP address behind the router and an IP address in the Internet since outside the subnet only the public address of
32
3 Related Work
the sub network is known. Another problem is the lack of standardisation among IPsec APIs resulting in portability problems when an application wishes to control the keying policy. In DTLS portability can be achieved although DTLS APIs are not standardised either since an application can be shipped along with the DTLS toolkit. For IPsec this is not so easily achievable because of its residence in the kernel space in contrast to DTLS which residues in application space. In order to simplify key negotiation, IPsec uses a reliable TCP connection to secure a separate datagram channel. This design is smart but has some problems. First, the application now has to manage two different sockets and synchronise them, where synchronisation is a signicant programming problem. If the TCP connection is left open after key negotiation, unnecessary system resources are wasted. On the other hand when the TCP connection is closed after key negotiation, any renegotiation must be done over UDP requiring another implementation for the keying negotiation over UDP which would make the key negotiation over TCP obsolete. Therefore it is more useful to have key negotiation and data transfer on the same channel. To secure RTP trafc DTLS is more suitable since RTP runs over UDP, any unnecessary connection (e.g. TCP for key negotiation) is a waste of system resources. VoIP is time sensitive therefore the addition of a security overhead should cost the least possible system resources thus providing enough security to be reliable. Furthermore for the use of IPsec as it resides in the kernel, for its use on a system not supporting IPsec the TCP/IP stack needs to be changed. To secure the application with DTLS only another application needs to be used.
3.2 Secure Real Time TransportProtocol

The Secure Real Time Transport Protocol (SRTP) [21] denes an already implemented prole for RTP, which intends to provide encryption, message authentication and integrity,
33
3 Related Work
and replay protection to the RTP data in Unicast and multicast applications. Note that SRTP must not be confused with RTP over DTLS. SRTP was published as RFC 3711 [21] in March 2004. This tightly coupled encryption mode for RTP provides a number of benets. The RTP header is left unencrypted which enables header compression (see [35], [36], and [37]) and easy debugging. The packets appear to be RTP packets, which is a benet for rewall compatibility. There is a zero header overhead. SRTP relies on an external key management protocol to set up the initial master key. Two protocols specically designed to be used with SRTP are ZRTP [22] and Mikey [38]. There are also other methods to negotiate the SRTP keys, several vendors offer products that use the SDES key exchange method. For encryption and decryption of the data ow, SRTP standardises utilization of only a single cipher. The Advanced Encryption Standard (AES) [39] is used by SRTP. AES can be used in two cipher modes, which turn the originally block AES cipher into a stream cipher. Since SRTP does not provide a keying mechanism and has to rely on other protocols it cannot be regarded as solution to secure VoIP trafc. In combination with ZRTP VoIP trafc can be secured. However SRTP is not widely used since users claim a reduced audio quality as a reason to turn ZRTP protection off. Furthermore ZRTP is not a widely known security architecture like TLS and therefore not as trustworthy as RTP over DTLS can be.
34
4 Security Considerations for VoIP

As already mentioned the transport of voice data over insecure networks as the Internet is harmed by various threats. This chapter provides security considerations about these threats, pointing out attacks and security goals to achieve in order to countervail these attacks.
4.1 Introduction
In order to classify the threats to VoIP properly rst the security-goals must be formulated. VoIP is IP trafc and thus the same attacks can be used. This is why VoIP calls are vulnerable to a variety of threats that traditional telephone calls are not. Any data being transmitted is at some risk of being eavesdropped. Data packets can be eavesdropped on anywhere along the transmission path. Alternatively the eavesdropped data could be changed and transmitted to the receiver, who would not notice receiving altered data, which is called a man in the middle attack. By transmitting the same message, e.g. an invitation to a VoIP phone call many times, the receiving machine could be kept so busy that no real calls can come through. This is called a denial of service attack. There are three classical primary security goals in modern communication systems: Condentiality Integrity
35
Availability Condentiality has been dened by the International Organisation for Standardisation (ISO) as "ensuring that information is accessible only to those authorised to have access" [40]. Integrity is the protection of unauthorised alteration of the transmitted data. Message integrity is as well as condentiality a part of DTLS. It ensures the user that the received data has not been changed without his notice. Availability means that the transmitted data will reach its destination and will thereby be available to the receiver. The Integrity of the voice data is hereby an important issue. Certainly it is easier to recognise whether someone on the phone is the person he or she claims to be than to recognise whether an e-mail was really written by the declared sender. This argument however applies mostly to private communication and communication among people who know each other well. But voice messages can be recorded, edited and replayed resulting in not letting the receiver notice that the caller is not the person he or she claims to be. Besides the integrity of the voice data, as well the signalling data needs to be integer and unaltered. The identity of the caller and the callee needs to be protected. If an attacker manages to manipulate his own identity he might achieve that the callee will be displayed a different id of the caller upon reception of a call. This can be used to reach persons on the phone who usually are not taking calls from anybody (e.g. the chief executive of a company). By acquiring a fake identity the billing of the VoIP provider can be bypassed and called will be charged to original owner of the account.
4.1.1 Condentiality in VoIP

Condentiality is an important security goal. In the context of VoIP the focus lies in the condentiality of the voice data. This means that calls cannot be eavesdropped. In VoIP condentiality is threatened more than in traditional telephony. In traditional telephony the attacker needs to have physical access to the network. Traditional telephony runs mostly
36
over separate networks while in VoIP the voice data is transmitted over the Internet, where all connected machines have the potential to be accessed through security holes. Many protocols in traditional telephony are barely published; therefore the analysis and attacks to traditional phone calls require special hard and software. The amount of people who are capable of eavesdropping phone calls is hereby reduced but it is not impossible.
4.1.2 Availability in VoIP

Availability in VoIP networks has two primary meanings, rst, the availability of the telephone service, which means that in case of SIP the SIP Proxy Servers are available and able to initiate sessions properly. Availability may also be harmed by unwanted calls, a problem which will be explained in the upcomming section. The second aspect of availability is the quality of the VoIP call. Both communication partners need to be able to understand each other clearly.
4.2 Threats and Attacks

As described in the preceding section VoIP calls are threatened by the same attack as other applications running over IP networks. Therefore an overview of different attacks is given in this section to classify the threats to VoIP calls. However a new technology that offers new services to users also offers new possibilities to attackers. Attacks can be divided into two groups, the rst group is the group of passive attacks, which include eavesdropping calls and snifng messages which are transmitted over the Internet. Much stronger is the second group of active attacks. Messages are manipulated during their transmission or faked messages are sent. An example of such an active attack is the Man-In-the-middle attack, where the attacker gains control of a router between two communicating systems and redirects transmitted packets. So called network or PortScans are used to plan an attack, by searching for weaknesses, hereby the attacker sends
37
various requests to a network or host in order to acquire information needed for further steps, like the operating system or installed services. For a so called Spoong Attack, messages or data packets with faked information are used. For example the IP address or MAC address of the sender can be changed so that the receiving machine assumes that the packet was sent from a trustworthy source. Another example for spoong is DNS Spoong; hereby DNS answers are changed, which results the requesting machine to communicate with a machine the hacker prepared. Denial of Service attacks replay request messages to servers in such high amounts that the servers service is not available anymore to regular users, targeting the availability of a system. VoIP might also be target of new attacks which are enabled through VoIP. Spam is a commonly known problem these days. Spamming is the abuse of e-mail to indiscriminately send unsolicited bulk messages. E-mail spam involves sending nearly identical messages to numerous recipients. As already mentioned SIP uses a similar address format as e-mail thus the problem of e-mail spam might become a problem for VoIP in the future. VoIP spam is not yet an existent problem, nonetheless it receives a great deal of attention from marketers and trade mark press. VoIP spam is also referred to as SPIT (Spam over Internet Telephony). Hereby malicious users could be telemarketers or prank callers. Currently there are rules for e-mail systems that block unwanted e-mail, such systems could (and probably will) also be applied to VoIP systems. SIP as the technology has been designed to support presence natively. Thereby incoming callers know the availability before even attempting to initiate a call. The three security services are realised through DTLS and implemented in the OpenSSL library which makes it a reasonable choice to secure VoIP trafc. Unfortunately no encryption can prevent the biggest threat, a virus or trojan on the endpoint giving a hacker access to the machine and thereby to the decrypted data.
38
5 RTP over DTLS

This section describes the basic idea of RTP over DTLS and possibilities for its realisation. Furthermore the performance of the intent is considered in comparison to SRTP.
5.1 Introduction to RTP over DTLS

RTP is using UDP to transmit data over IP based networks. Implementations typically have interfaces to UDP socket classes to open/close sockets and transmit/receive data. DTLS is using UDP sockets as well for transmission and reception of data. Therefore RTP can operate as well on top of DTLS instead of just UDP, when functionality for connection initiation is added. Thus an encryption scheme is added to RTP providing key exchange and encryption/decryption of data. The basic idea to realise this is an interface that is used by an alternative RTP class for the underlying transport protocol that manages connection requests and connection acknowledgements. Thereby SIP softphones could simply start RTP over DTLS session as an alternative RTP prole, instead of a standard RTP session. Since normal RTP and RTCP payloads are sent in a UDP packet, the can be send as well in a DTLS packet. Therefore an RTP packet send over DTLS has the layout as in gure 5.1 on page 40. RTP opens typically two sessions, one for data trafc and one for RTCP trafc. In order to secure RTP trafc at least for the data session should be a DTLS session should be initiated. Securing the RTCP session would be possible as well but since the RTCP packets do not
39
5 RTP over DTLS
Figure 5.1: Struckture of an RTP packet sent over DTLS
contain condential data, this is not mandatory. RTP over DTLS is a trustworthy approach in order to achieve secured VoIP calls. DTLS is practically designed to be used in a VoIP scenario and because of its well known predecessor likely to gain the trust of users as well.
5.1.1 SRTP Compatibility Mode

SRTP Compatibility Mode is a prole for RTP over DTLS which is presented in [9]; it depends on two extensions to SRTP which reduce the pre-record bandwidth of the data channel and allow partial encryption of record bodies . This prole depends on Extensions for DTLS in Low Bandwidth Environments [41] and on TLS Partial Encryption Mode [42]. In this prole, the RTP header is left unencrypted, which enables header compression. With unencrypted headers the packets appear as RTP packets which results in rewall compatibility. Furthermore this prole provides encryption with a zero header overhead, and thus improved performance in comparison to RTP over DTLS. For this prole, implementations need to negotiate the TLS partial encryption extension, the DTLS implicit application data header and the TLS MAC truncation extension. Thereby the RTP over DTLS packets would look identical to SRTP packets with a 10-byte MAC value. They can only be distinguished with access to the DTLS or SRTP keying material. Since the RTP header is clear, header compression and debugging both work. The security properties of DTLS are not affected by these extensions. This extension to RTP over DTLS
40
5 RTP over DTLS
is not part of the implementation conducted in this thesis but worth to note for future development of the prole.
5.1.2 Packet size Comparison

This section provides a comparison of packet sizes in order to estimate the performance of RTP over DTLS in comparison to RTP, SRTP and the SRTP compatibility mode. Since most of the RTP infrastructure is reused, the overhead for SRTP is low. A 20 ms RTP packet encoded with G.729 codec has a size of 60 bytes. This RTP packet would be just 4 bytes longer, when SRTP is used, but only as long as SRTP is used without a master key identier. But as already described in a previous section this is not desired. With master key identier the SRTP packet has a size of 68 bytes. When DTLS is used, the same packet would be 98 bytes long while in SRTP compatibility mode the packet size could be reduced to 70 bytes which marks an excellent result. Therefore the SRTP compatibility mode should be added to RTP over DTLS in the future.
5.1.3 Security Considerations

RTP over DTLS can be considered secure since DTLS is based on TLS, which has seen extensive security analysis. The handshake algorithm incorporated in DTLS works over an insecure channel. Only the certicates have to be proved to be correct. In the standard authentication strategy of DTLS a PKIX [43] certicate is exchanged. When the client veries the certicate he checks whether the name in the certicate matches the servers domain name. This works because there are relatively small number of servers with well dened names; a situation which does not usually occur in the VoIP context [9]. Alternatively the certicates could be self signed but then the client must be able to verify the servers certicate correctly and vice versa. An approach to address this is using SIP [11] and the Session Description Protocol (SDP) [18] and is described in [44] and [45].
41
6 Implementation Design
This chapter provides an analysis of requirements along with a description of the choice of implementations used in this thesis. Hereby the chosen libraries are presented as well. The system idea is presented in a more detailed way along with the functionality and interaction of the single components used for the prototype implementation.
6.1 Analysis of Requirements

The most important phase of a software project is the analysis. Empirical Studies on failures of software projects have proven that indistinct formulation of goals and requirements are with distant the most popular reasons for a failure. Small mistakes with their root in the early development stage caused by inaccuracies can lead to big problems in the nal stage of the development process because of error propagation. The detailed documentation of requirements in the early stage of the development process is therefore indispensable as a guideline through the project. The requirements and goals formulated in this section base on studies of the protocols and their implementations and discussions with my supporting Prof. Dr. Xiaoming Fu. During the process of development requirements can be altered or extended to reach the goals and to react exibly to problems on the way.
42
6.2 System Idea/Intent

The concept of the system to implement is based on H. Tschofenigs Internet Draft [9] and the concept was discussed in acknowledge sessions with H. Tschofenig and Professor Dr. X. Fu. The basic idea is to secure RTP data trafc using the DTLS protocol. In order to achieve this, a prototype needs to be implemented to prove the functionality of the idea. In the following steps the surrounding framework of software needs to be extended to support this option. In the rst step DTLS has to be well investigated to formulate the demands of changes needed at the RTP side of the project to reach the goal. The second step involves analysis of the RTP implementation and protocol to determine the functionality of the connection DTLS shall provide. As the consecutive step the interfaces of the implementations are used to derive the implementation design of the project. When audio data is successfully transmitted with RTP over DTLS the next step is to prove the functionality in a SIP application such as a softphone.
6.2.1 DTLS
The DTLS protocol is designed to secure data between communicating applications. It is designed to run in application space, without requiring any kernel modications. DTLS uses regularly one UDP socket per connection and endpoint. Therefore upon connection initiation at each endpoint a socket is created before the DTLS handshake can be initiated. After successful completion of the handshake the sockets are ready to transmit and receive secured data. Upon termination of the connection both sockets are closed.
6.2.2 RTP
RTP has no possibilities to initiate a connection between two hosts itself. Therefore additionally SIP is used to initiate a Session between two computers. Upon connection initation
43
RTP initialises two sessions on each host, one for data and one for RTCP trafc. Each of these sessions normally consist of two sockets, one for reception and one for transmission. Next the RTP stack is started and packet transmission and reception starts on each session until the RTP stack execution is stopped. Beside unicast conferences RTP is also capable of multicast conferences. This feature can not mapped to a DTLS secured session since the key exchange protocol of DTLS is designed only for host to host communication and the DTLS key exchange is one of the cornerstones of DTLSs benets to the implementation. RTP data (and control) packets are usually transmitted via UDP; therefore RTP comes with an underlying transportation layer similar to the transportation layer DTLS uses. A reuse of these functions shall be reviewed in order to keep changes slim and simple in the upcoming design section.
6.2.3 SIP Softphone

The RTP media channel is initiated through SIP as presented in gure 2.6 on page 25. Therefore the RTP over DTLS session will also be initiated by the SIP softphone client application. A softphone application is the best choice to test the implementation framework. A media channel between two hosts will be established using the RTP stack with an underlying DTLS. Herby options for key generation and administrative functions for the certicate les should be implemented.
6.3 RTP over DTLS

The requirements regardless of implementation design can now be formulated more detailed. In order to build a unied media security framework changes need to be done to all components that interact together. Before any connection with RTP over DTLS can take place the user must choose the option
44
that RTP over DTLS should be used if available for both communication partners. The SIP component needs to support mechanisms necessary to cope with basically four cases. In rst case the connection can be established without errors, when both communication partners have a proper running system which supports RTP over DTLS. In second case there is an error on the caller side which might occur, when certicates cannot be accessed. The caller should be notied by that already when settings are adjusted to use RTP over DTLS for calls in the setup. In case the RTP over DTLS feature is not supported by the callee either the connection will be established without any protection, or the next supported security system supported by both sides will be used. Hereby of course the caller must be notied that the connection is not secured in the intended way. At last there is of course the chance that security certicates cannot be veried or the DTLS connection could not be initialised properly for other reasons and therefore a secure connection therefore cannot be guaranteed. In this case the users needs to be informed immediately about the situation and get an advise what this means and what to do. When the call is accepted by the callee and both parties have RTP over DTLS available this component is started to initialise the DTLS sockets. The RTP session hereby needs to be divided to a server and client (passive and active) part, where the client initiates the DTLS connection to the server and the server accepts the clients connection request. When the connection is established the data transfer of RTP can start. At the end of the session the DTLS connection needs to be properly shut down. DTLS negotiates the ciphers during handshake (see Background section) and exchanges certicates and keys. These keys must be generated as well and certicates provided. This task will be done by the SIP application in connection with functions provided through OpenSSL.
45
6.4 Choice of Libraries

This section presents the choice of libraries implementing the protocols used, like DTLS, RTP and SIP. The choice of the DTLS implementation the prototype is based on is straightforward, since OpenSSL is the only known implementation supporting this protocol to the best of our knowledge. For RTP a choice has to be made since some implementations exist (e.g. ORTP, CCRTP...). CCRTP provides in comparison to ORTP object oriented C++ code and is therefore better suited for the change to a different underlying transport protocol. The online documentation of the ccRTP library is a great helper in understanding the class structure of the library. This makes the choice of the ccRTP library easy. To complete the prototype the Twinkle Soft phone client seems the most reasonable choice as a SIP client using the ccRTP library as RTP stack.
6.4.1 OpenSSL
OpenSSL1 [46] is the de facto standard open source TLS/SSL implementation [2]. It has proven to be stable and is used by numerous production quality servers such as Apache Web Server. OpenSSL implements SSLv2. SSLv3, TLSv1 and DTLSv1. Each of these protocols is implemented by sharing as much code as possible, with virtual functions handling protocol differences. The library is implemented in C and from the librarys standpoint, DTLS appears to be another version of the TLS protocol.
1 http://www.openssl.org/
46
6.4.2 CCRTP
GNU ccRTP2 is an implementation of RTP, the real-time transport protocol from the IETF (RFC 3550, RFC 3551, and RFC 3555). The library is implemented in C++ and based on GNU Common C++3 . Therefore it can provide a high performance, exible and extensible standards-compliant RTP stack with full RTCP support. It is dened rather as an application layer framework than a typical Internet transport protocol such as TCP or UDP. In the design for ccRTP support for audio and video data is considered. Unicast, multiunicast and multicast transport models are supported, as well as multiple active synchronization sources, multiple RTP sessions (SSRC spaces), and multiple RTP applications (CNAME spaces). This allows its use for building all forms of Internet standards based audio and video conferencing systems [47]. CcRTP uses packet queue lists for reception and transmission of data packets. The synchronisation of both (outgoing and incoming) media is automatically handled within the packet queues. There is support for RTCP and other standard and extended features needed for both compatible and advanced streaming applications. The implementation uses templates to isolate threading and sockets related dependencies, so that it can be used to implement real time streaming with different threading models and underlying transport protocols which is an essential feature for this work. At its highest level, ccRTP provides classes for the real-time transport of data through RTP sessions, as well as the control functions of RTCP. The main concept in the ccRTP implementation of RTP sessions is the use of packet queues to handle transmission and reception of RTP data packets/application data units. In ccRTP, a data block is transmitted by putting it into the transmission (outgoing packets) queue, and received by getting it from the reception (incoming packets) queue.
2 http://www.gnu.org/software/ccrtp/ 3 http://www.gnu.org/software/commoncpp/
47
6.4.3 Twinkle Softphone

Twinkle4 [48] is a softphone for VoIP and instant messaging communications using the SIP protocol which is based on open source and open standards. Twinkle is using the ccRTP stack qualifying it to be the SIP application in the RTP over DTLS prototype implementation. As a useful feature the Twinkle softphone implements as well direct IP to IP phone communication where a SIP proxy is not needed. The SIP invitation will be directly submitted to the IP address of the callee. This is a useful feature for developent and testing, since in the testbed the whole SIP architecture with Proxy also does not need to be mapped. The current version does not provide video calls but this feature is planned for future releases and does not mark a problem at this stage for the RTP over DTLS prototype since the focus lies primarily on functionality tests for voice calls. Video calls and securing them will be an interesting topic for future work when it is proven that RTP over DTLS works unobjectionably.
4 http://www.twinklephone.com/
48
7 Design Details
This chapter describes the implementation process, milestones and problems which were handled along the way. Hereby rst the protocol operations are presented and then how the components in the prototype implementation of the unied media security framework function together. The previous chapter provides an analysis serving all necessary information to design successfully a solution method. In this chapter the architecture and interfaces of the component to develop will be designed and the adaptation to the existing structure and interfaces projected.
7.1 Design Components: RTP - ccRTP, DTLS - OpenSSL and SIP Twinkle
This section decribes the interaction of the components used to design the unied media security framework. Each library used is decribed with its interaction to other libraries.
7.1.1 OpenSSL
The OpenSSL website provides an online documentation of the application programming interface (API) to ease the implementation of a secure socket. However although DTLS is already supported by OpenSSL for more than a year, DTLS is not mentioned at all in the documentation. Merely TLS is mentioned as an optional protocol version.
49
7 Design Details
7.1.2 Socket Initialisation

According to the documentation at rst the library must be initialised, thereby all available ciphers and digests are registered. Next an SSL_CTX object is created as a framework to establish SSL based connections. An SSL_method object is then assigned to the context in order to determine the protocol version used. Various options regarding certicates, algorithms etc. can be set in this object. After a network connection has been created, it can be assigned to an SSL object. The SSL object has been created with the SSL_CTX object created before. Next the handshake is performed with SSL_accept and SSL_connect. SSL_write and SSL_read functions are used to read and write data on the connection while SSL_shutdown is used to shut down the connection. Additional hints how a DTLS connection can be established are provided through the demonstration programs s_server and s_client. These all-rounder examples are able to establish any kinds of SSL connections with their roundabout 3500 lines of code. The code itself is barely commented and provides only poor information which functions need to be called to establish a connection. As an example in the s_client.c le a comment in line 735 starts with "This is an ugly hack that does a lot of assumptions [...]"[46] However there is a huge mailing archive providing a handful of issues about DTLS connections.
7.1.3 Session Initialisation with ccRTP

Upon initialisation of an RTP session an object of the class RTPSession is created. There are two kinds of constructors. The rst one takes two mandatory arguments: local network address and local transport port, which is the place where incoming packets will be expected. The second constructor is not of interest, since it takes a multicast address as argument to join a multicast group. By calling the startRunning() method, an RTPSession object is signalled to start execution of the stack thread. After these steps, the application can receive data, but will not transmit to any destination.
50
7 Design Details
In order to transmit, the method addDestionation is called along with the internet-address and port of the host to be transmitted to.
7.1.4 Sending Data

Data packets are sent through the method putData, which takes as rst parameter the RTP timestamp for the data specied as second parameter. By default, the marker bit of the sent packets is not set. Its value for the next packet (the one that will convey the data provided in the next call to putData can be set through the setMark method, which takes a Boolean as argument. CcRTP also supports fragmenting data blocks into several RTP packets. The setMaxSendSegmentSize method can be used to request that no RTP packet be transmitted with a payload length greater than the value specied through setMaxSendSegmentSize. When data blocks greater than the maximum segment size are provided through putData, two or more packet will be inserted in the outgoing packet queue. All these packets but the last one will have length equal to the maximum segment size, whereas the last ones size will be lower or equal to the maximum segment size.
7.1.5 Receiving Data

To receive data from the incoming packet queue the getData method is used. This method checks with a dened timeout whether data can be read from the socket and in that case then returns a pointer to an AppDataUnit object as opposed to a pointer to a memory block. In ccRTP application data units are represented through objects of the AppDataUnit class, which provides access to the synchronization source of the data and other related properties. The incoming packet queue takes care of functions such as packet reordering or ltering out duplicate packets.
51
7 Design Details
7.1.6 Closing Sessions

To close an RTP session simply the RTPSession objects have to be destroyed. The stack will then transmit a BYE packet, indicating the end of the session, to all destinations when the destructor of the sessions is called.
7.1.7 Types of Sessions

Upon creation of an RTPSession object, two DualRTPChannel objects are created with DualUDPIPv4Socket. This denes a communication channel for RTP and/or RTCP streams. In this class a socket is implemented as a pair of UDP IPv4 sockets, allowing both, transmission and reception of packets. The implementation relies on the Common C++ UDP Socket class and provides a at interface that includes all the services required by the RTP stack. There are two ways to use this class, to instantiate the DualSocket template, which will be used to instantiation RTP stack template or to directly instantiate an RTP stack template. This class offers an example of the interface that other classes should provide in order to specialise the ccRTP stack for different underlying protocols.
7.2 SIP Session Initiation with Twinkle

As already mentioned Twinkle can be operated in regular SIP mode using a SIP provider for discovery of a communication partner by the SIP Address as seen in gure 2.6 on page 25 or in direct mode. In both cases when the callee accepts the call, the RTP media channel is set up. Twinkle uses a Symmetric RTP session consisting of two Single Thread RTP sessions, one for data trafc and one for control packets. In order to establish a secure RTP over DTLS session the modied templates in the ccRTP library are used instead of the regular ones.
52
7 Design Details
Figure 7.1: Implementation status after phase 1
7.3 Implementation Process

The implementation process is divided into three consecutive steps. The consecutive model allows changes in the implementation and the marking of milestones to conrm the success of the achieved progress. In the rst phase a DTLS client server application is implemented in C++ as the bases for any further development. As a result of the poor documentation provided by OpenSSL this step marked a much greater challenge than expected in advance. A DTLS Client/Server example1 written in C was used as guideline in this phase because examples provided by OpenSSL were not clearly arranged and therefore not usable. The result of the rst step is illustrated in gure 7.1 on page 53 where a DTLS connection is established between host A and host B. Figure As soon as a secured connection between two hosts is possible this connection-imitation functionality can be used by the ccRTP stack replacing the regular underlying UDP sockets with DTLS sockets. Figure At the end of this stage rst transmissions of audio data with test applications should work and demonstrate the functionality of RTP over DTLS as presented in gure 7.2 on page 54. Hereby test.au is an audio le in the au le format. The au le format is a simple audio le
1 found
at http://linux.softpedia.com/get/Security/DTLS-Client-Server-Example-19026.shtml
53
7 Design Details
format introduced by Sun Microsystems2 . Further information can be found at [49]. Upon setting up the RTP connection between the two hosts, the DTLS connection is established during initialisation of the transport channel, where before the UDP sockets were initiated. In the last stage all parts of preceding steps have to work perfectly together in order to function as a secured VoIP call. Figure 7.3 on page 54 illustrates the progress at this stage. While stage 3 marks the goal of this thesis this is however not the end of the process. Further implementation work is needed to provide a usable application. These steps will be presented in the future work section at the end of this thesis.
2 http://www.sun.com
54
7 Design Details
Figure 7.4: RTP over DTLS class structure
7.4 Class Structure

This section describes the changes applied in the libraries and hereby presents the new les inserted in to the class hierarchy. In the ccRTP library the channel.h le denes the RTPSession types and initialisation of underlying transport protocols. The regular RTP session inherits from the CommonC++ UDP Socket class the UDP socket and implements the functionality for the RTP stack. In order to implement RTP over DTLS two template classes were added to the channel.h le, RTPDTLSServer and RTPDTLSClient. Each of these templates is associated to an interface to the OpenSSL library providing the functionality for connection initiation and certicate verication. Theses les are placed in the /src/rd directory of ccRTP. These classes make direct use of the OpenSSL API and socket classes. Figure 7.4 on page 55 presents the structure of the added components. With this structure any program using RTP is enabled to initialise RTP over DTLS session instead or as an alternative to regular RTP sessions.
55
7 Design Details
7.5 Problems and Discussion

The solution provided is functional but unfortunately not perfect due to the manner RTP functions. Upon initiation of an RTP session two sockets are created, one for transmission and one for reception. Since RTP supports multicast these sockets do not have any information about the destination host upon initialisation. RTP uses a setPeer function which is called periodically to set the destination IP address on the socket. This feature is not compatible with a DTLS connection. In order to initialise a DTLS connection the destination IP address must be known upon initialisation. Therefore the DTLS connection is initialised upon rst call of the setPeer function and not upon call to the constructor of the session since the destination IP address could not be handed to the constructor without changes to the RTP-stack implementation. Further calls to the add Destination function therefore must not cause any action.
56
8 Testing
The prototype implementation of RTP over DTLS is tested in order to conrm the usability of the approach. There is a wide range for testing the approach, however due to space and time restrictions in this thesis not all aspects of RTP over DTLS were analysed so far.
8.1 Testing Methodology

Before the results of the tests are presented, an overview about the testing methodology and testbed will be given. As formulated in the goals of this thesis, for Telephony a packet loss rate of up to 5% is still acceptable according to the ITU-T. Therefore the implementation of RTP over DTLS shall provide a packet loss rate lower than 5%. Certainly not more packets get lost because the underlying transmission is changed to a DTLS connection, but due to encryption, decryption and increased header size, resulting in a higher bandwidth needed to achieve the same throughput, data packets might reach the destination too late to be inserted into the output media stream. Therefore the question to be answered is whether RTP over DTLS is capable of delivering the audio data within the strict time limits allowing acceptable voice quality during a call. During the phone call the delays must be kept within the restrictions for VoIP trafc to allow a uent conversation. According to the ITU-T, delays in telephony should not exceed 150 ms in order to provide a satisfying quality for all users.
57
8 Testing
8.2 Testbed Setup

The testing experiments were run on standard PCs with a Suse Linux Kernel 2.6.18-05 with following hardware: Machine A: Intel Pentium D processor with 3.06 GHz 512 MB RAM 40 GB of hard disk 1 100MBit Network Interface Card (NIC) Machine B: AMD Duron processor with 800 MHz 612 MB RAM 60 GB hard disk 1 100MBit NIC The hosts are connected in a 100 Mbit Ethernet Network with a topology presented in gure 8.1 on page 59.
8.3 Measurement Methods and Tools

To prove the functionality and usability of the RTP over DTLS implementation prototype, modied versions of demonstration programs provided in the ccRTP library were used. Timestamp output was added to the applications in order to determine the delay between transmission and reception of a data packet. Open Ofce1 Calc, a spreadsheet analysis
1 http://www.openofce.org/
58
8 Testing
Figure 8.1: Testbed for RTP over DTLS tests
program was used to calculate the delay of a data packet as the time difference between the transmission and reception timestamps. Plots and summaries from the tests were generated with Gnuplot2 [50] from the report les.
8.4 Results
This section presents the results from the experiments. Aim of the performance test is to determine the delay caused by the encryption with DTLS for RTP trafc. Tests were performed with modied versions of the ccRTP demonstration programs audiorx and audiotx. These applications initiate RTP sessions and transmit audio data from audiotx to audiorx where audiorx plays the audio data over the systems audio interface. In the original version these applications use the loopback address to simulate RTP trafc on a single machine. By changing the IP addresses used, these programs are capable of transmitting data from one host to another. Audiorx is using a 50 ms jitter buffer to assure a continuous media stream during reception. The jitter is the variation of packet interarrival time. While the sender is expected to transmit a packet every 20 ms, these packets can be delayed throughout the network and
2 http://gnnuplot.info
59
8 Testing
not arrive at that same regular interval at the receiver side. The difference between when the packet is expected and when it is actually received is jitter. The jitter buffer conceals the interarrival packet delay variation. Data packets arriving with a delay greater than 50 ms will not be played; instead the next packet that arrived is played. In VoIP applications the jitter buffer is exible in order to adapt to the delay in the current call. In order to analyse the RTP over DTLS performance instead of a regular RTP session, the RTP over DTLS server and client session objects were initialised in these applications. In order to obtain comparable results a 62.5 KB audio le was used for transmission to simulate voice data of a call which has a play time of 7 seconds. Thereby 399 data packets of audio data were transmitted. Taking account of possible measurement inaccuracy and errors due to the experimental environment, all tests were done repeatedly to verify the results.
60
8 Testing
Transmission of Audio Data 100000 RTP Packet Delay
80000
Delay in microseconds
60000
40000
20000
0 0 50 100 150 200 Packet No. 250 300 350
Figure 8.2: Delay for normal RTP packets
8.5 Standard RTP Packet Delay

The rst performance test examines the packet delay of regular RTP packets in order to have a reference value for comparison. Figure 8.2 represents a typical output for this test. The average delay of an RTP packet was measured with 13 ms. however some packets arrived signicantly later; the maximum delay was measured with 51 ms while the minimum delay was only 4ms. The standard deviation of the delay during the experiment was calculated at 6.1 ms. During this experiment one data packet did not arrive within the preset time limit, therefore the packet loss rate is 0.25%. In the graph the lost packet is marked by the high peak (51ms delay) shortly before the 200th packet. As the audio le is played during reception the subjective impression of the result can be expressed as well.
61
8 Testing
The sound le was played continuously without any disturbance as clear as it would be played locally.
62
8 Testing
Transmission of Encrypted Audio Data 100000 RTP over DTLS Packet Delay
80000
Delay in microseconds
60000
40000
20000
0 0 50 100 150 200 Packet No. 250 300 350
Figure 8.3: Delay for RTP over DTLS packets
8.6 RTP over DTLS Packet Delay

The second experiment determines the delay of RTP trafc over DTLS. For this experiment the same demo applications were used as in the preceding one with the change that now RTP over DTLS sessions are initiated by the programs. Repeated tests showed similar results as in gure 8.3. The average delay of an encrypted RTP packet was measured with 34 ms. however some packets arrived signicantly later; the maximum delay was measured with 92 ms while the minimum delay was only 9ms. The standard deviation of the delay during the experiment was calculated at 7.7 ms. During this experiment one data packet did not arrive within the preset time limit, therefore the packet loss rate is 0.25%. In the graph the lost packet is marked by the high peak (92ms delay) at the beginning. The
63
8 Testing
sound le was played continuously without any disturbance as clear as it would be played locally.
8.7 CPU Usage

In a separate experiment the average CPU load was measured. Since a machine has to handle both, transmission and reception, when performing a VoIP call, this experiment was carried through on a single machine. The 3.06 GHz machine was used in this experiment. Audiotx and Audiorx were initialising the DTLS connection and transmitting an audio le with a length of 1:25 minutes. Repeated test showed that RTP has an average CPU load of 1.45 % while RTP over DTLS has an average CPU load of 3.4 %. The signicant increase is caused by the handshake and the encryption/decrytpion during the session. For normal PCs used for VoIP this increase does not mark a problem, but this could be a problem for todays generation of handheld devices. Therefore further investigation is neccessary to analyse the impact of increased CPU load upon different terminal devices, like cell-phones.
8.8 Test Summary

The test results of the two experiments show that RTP over DTLS works in an acceptable manner. The delay of encrypted RTP packets was expected to be higher than the delay of unencrypted packets, due to encryption/decryption operations and extended packet overhead by DTLS. The question to be answered was whether the delay of encrypted RTP packets meets the requirements for VoIP trafc. In the experiments RTP packet delay was measured at an average rate of approximately 13 ms, a maximum of 50 ms, a minimum of 4ms and a standard deviation of 6.1 ms. The delay of encrypted RTP packets was measured at an average rate of approximately 34 ms
64
8 Testing
with a maximum of 92 ms, a minimum of 9 ms and a standard deviation of 7.7 ms. The important values in the results of these experiments are the average delay, the packet loss rate and the standard deviation. The average delay is increased by approximately 20 ms when DTLS encryption is used. According to the ITU-T a delay of 125 ms is noticeable by humans, therefore they recommend that delays should not exceed 150 ms. A delay from 200 to 280 ms still satises most of the users, while delays higher than 300 ms dissatisfy some users and a delay higher than 400 ms is unacceptable because most users are dissatised [51]. Most of the delay in real scenarios is caused by the network infrastructure. For a distance of less than 5000 km VoIP connections are likely to experience a delay smaller than 150 ms. For intercontinental connections delays in the mid-200 ms range can be expected, which does not mark a problem according to the ITU-T because users expect differences to regional calls. Compared directly, the RTP over DTLS delay average has more than twice the length of regular RTP delays, but the delays should be set in relation to ITU-T restrictions. Thus an average delay increase of 20 ms marks an increase of about 13% to the recommendation of a 150 ms delay. The small increase (1.6 ms) in the standard deviation is a good result as well. This means that the jitter buffer does not need to be increased by a relevant size. Therefore RTP over DTLS is well suited for encryption of life media as in VoIP.
65
9 Conclusion and Future Work

9.1 Conclusion
The growing acceptance among users of VoIP Telephony brings as well new challenges in terms of security issues. VoIP calls are threatened by various known attacks since the data is transported over insecure networks. The security considerations section pointed out these attacks along with security goals to achieve. Furthermore new attacks (e.g. SPIT) which are enabled through the extended capabilities and new services introduced by VoIP may threaten the widespread use of this technology in the future. Therefore security is a major concern in the further development of VoIP services. So far no solution to secure VoIP calls in an acceptable manner is widely used. The approach of RTP over DTLS has the potential to overcome the shortcomings of other approaches and take part in future developments of a security framework for VoIP. DTLS provides Authentication This allows both participants of the call to verify the identity of the other party. Condentiality This ensures that the VoIP call can not be eavesdropped or understood by a third party. Integrity This allows VoIP applications to detect if data was modied during transmission.
66
Unfortunately DTLS cannot solve all issues in securing Internet Telephony. Denial of Service attacks against the SIP infrastructure cannot be secured by DTLS, since RTP over DTLS is initiated after the SIP interaction takes place to initiate the session. The approach also does not address the issue of SPIT for the same reason, the authentication can help to solve the issue since SPIT calls could be traced back to the users, but this would only possible when all users use the DTLS authentication. This is however not possible yet, since reachability through traditional phones is still desired. In this thesis a prototype of RTP over DTLS was implemented and tested in order to prove the usability of the approach. The upcoming sections summarise and evaluate the test results of the prototype. Furthermore an outlook is given to future work which will be necessary to goal of the development of a unied media security framework for VoIP. The datagram capable version of TLS was designed in order to secure media streaming without compromising the quality of the media streamed or the widely accepted security features of TLS. The test results show the good performance of the prototype implementation of RTP over DTLS in comparison to unencrypted RTP. The increase in the delay of approximately 20 ms is in an acceptable range in order to allow secure communication without impact on the quality of the VoIP call. These results allow planning of future steps that need to be done on the way to a unied media security framework for VoIP which are presented in the upcoming section.
9.2 Future Work and Open Issues

The prototype implementation of RTP over DTLS is capable of connection establishment and data transmission. The performance of the DTLS and RTP components was tested with acceptable results, but not in detail. There is however the potential for improvement of performance. There might be some optimisation possible in the connection establish-
67
ment, since this part was developed with almost no documentation from the developers of DTLS in OpenSSL. The next suggested step in further development includes improvisation at DTLS level. H. Tschofenig an E. Rescorla introduced the SRTP compatibility mode [9]. With the thereby presented enhancements to RTP over DTLS the performance can be increased since overhead is reduced to a value comparable to ZRTP. In the following the performance of SRTP compatibility mode of RTP over DTLS can be compared with experiments to ZRTP in order to evaluate the approach. The integration to the Twinkle softphone is as well not completely nished. This thesis focuses on taking part in the development of a unied security framework regarding all components in the system. Due to time restrictions the focus lies on the interaction of RTP and DTLS components to provide a basis for further development. A concept of user interaction in connection with the encryption scheme needs to be designed and integrated to the softphone and SIP. Hereby the challenge lies in the combination of understanding what is happening and ease of use in order to achieve acceptance among users. Thereby the management of certicates needs to be integrated to the softphone along with notication about the state of security and proper error handling upon possible DTLS handshake failure and user notication about the security state of the connection. Furthermore at the SIP (and SDP) side of the framework RTP over DTLS needs to be integrated to the session invitation, so that the caller is able to inform the callee about the wish to establish an RTP over DTLS session when connections are initiated over the SIP network.
68
Bibliography
[1] Christian Friedrich. Schematic representation of the ssl handshake protocol with two way authentication with certicates Wikipedia, the free encyclopedia, 2007. [Online; accessed August 2007]. [2] N. Modadugu and E. Rescorla. The Design and Implementation of Datagram TLS, 2004. [3] J. Postel. Internet Protocol. RFC 791 (Standard), 1981. Updated by RFC 1349. [4] T. Dierks and E. Rescorla. The Transport Layer Security (TLS) Protocol Version 1.1. RFC 4346 (Proposed Standard), 2006. Updated by RFCs 4366, 4680, 4681. [5] J. Postel. Transmission Control Protocol. RFC 793 (Standard), 1981. Updated by RFC 3168. [6] H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson. RTP: A Transport Protocol for Real-Time Applications. RFC 3550 (Standard), 2003. [7] J. Postel. User Datagram Protocol. RFC 768 (Standard), 1980. [8] E. Rescorla and N. Modadugu. Datagram Transport Layer Security. RFC 4347 (Proposed Standard), 2006. [9] E. Rescorla H. Tschofenig. Real Time Transport Protocol (RTP) over Datagram Transport Layer Security. Internet Draft, February 2006.
69
Bibliography
[10] Wikipedia. Voice over ip Wikipedia, the free encyclopedia, 2007. [Online; accessed 22-April-2007]. [11] J. Rosenberg, H. Schulzrinne, G. Camarillo, A. Johnston, J. Peterson, R. Sparks, M. Handley, and E. Schooler. SIP: Session Initiation Protocol. RFC 3261 (Proposed Standard), 2002. Updated by RFCs 3265, 3853, 4320, 4916. [12] H. Schulzrinne and C. Agboh. Session Initiation Protocol (SIP)-H.323 Interworking Requirements. RFC 4123 (Informational), 2005. [13] T. Berson. Skype security evaluation, October 2005. [14] R. Hancock, G. Karagiannis, J. Loughney, and S. Van den Bosch. Next Steps in Signaling (NSIS): Framework. RFC 4080 (Informational), 2005. [15] E. Rescorla. HTTP Over TLS. RFC 2818 (Informational), May 2000. [16] Shamir A. Rivest, R. and L.M. Adleman. Cryptographic communications system and method. US Patent 4405829, 1977. [17] P. Karn and W. Simpson. Photuris: Session-Key Management Protocol. RFC 2522 (Experimental), 1999. [18] D. Brezinski and T. Killalea. Guidelines for Evidence Collection and Archiving. RFC 3227 (Best Current Practice), 2002. [19] H. Schulzrinne. The tel URI for Telephone Numbers. RFC 3966 (Proposed Standard), 2004. [20] Voice over miscongured internet telephones - (vomit). http://vomit.xtdnet.nl/. [21] M. Baugher, D. McGrew, M. Naslund, E. Carrara, and K. Norrman. The Secure Realtime Transport Protocol (SRTP). RFC 3711 (Proposed Standard), 2004.
70
Bibliography
[22] Ed.Avaya J. Callas P. Zimmerman, A. Johnston. ZRTP: Media Path Key Agreement for Secure RTP. Internet Draft, 2007. [23] Jrg Schwenk Andr Adelsbach, Mark Manulis. Voipsec Studie. Technical report, Bundesamt fr Sicherheit in der Informationstechnik. [24] S. Kent and K. Seo. Security Architecture for the Internet Protocol. RFC 4301 (Proposed Standard), 2005. [25] S. Kent. IP Encapsulating Security Payload (ESP). RFC 4303 (Proposed Standard), 2005. [26] C. Kaufman. Internet Key Exchange (IKEv2) Protocol. RFC 4306 (Proposed Standard), 2005. [27] E. Rescorla. Dife-Hellman Key Agreement Method. RFC 2631 (Proposed Standard), 1999. [28] D. Maughan, M. Schertler, M. Schneider, and J. Turner. Internet Security Association and Key Management Protocol (ISAKMP). RFC 2408 (Proposed Standard), 1998. Obsoleted by RFC 4306. [29] D. Piper. The Internet IP Security Domain of Interpretation for ISAKMP. RFC 2407 (Proposed Standard), 1998. Obsoleted by RFC 4306. [30] H. Orman. The OAKLEY Key Determination Protocol. RFC 2412 (Informational), 1998. [31] H. Krawczyk. Skeme: A versatile secure key exchange mechanism for internet. In Proceedings of the 1996 Symposium on Network and Distributed System Security (SNDSS 96), 1996.
71
Bibliography
[32] Wikipedia. Ipsec Wikipedia, the free encyclopedia, 2007. [Online; accessed June 2007]. [33] S. Kent. IP Authentication Header. RFC 4302 (Proposed Standard), 2005. [34] Whiteld Dife, Paul C. van Oorschot, and Michael J. Wiener. Authentication and authenticated key exchanges. Designs, Codes and Cryptography, 2(2):102125, 1992. [35] S. Casner and V. Jacobson. Compressing IP/UDP/RTP Headers for Low-Speed Serial Links. RFC 2508 (Proposed Standard), 1999. [36] C. Bormann, C. Burmeister, M. Degermark, H. Fukushima, H. Hannu, L-E. Jonsson, R. Hakenberg, T. Koren, K. Le, Z. Liu, A. Martensson, A. Miyazaki, K. Svanbro, T. Wiebke, T. Yoshimura, and H. Zheng. RObust Header Compression (ROHC): Framework and four proles: RTP, UDP, ESP, and uncompressed. RFC 3095 (Proposed Standard), 2001. Updated by RFCs 3759, 4815. [37] T. Koren, S. Casner, J. Geevarghese, B. Thompson, and P. Ruddy. Enhanced Compressed RTP (CRTP) for Links with High Delay, Packet Loss and Reordering. RFC 3545 (Proposed Standard), 2003. [38] D. Ignjatic, L. Dondeti, F. Audet, and P. Lin. MIKEY-RSA-R: An Additional Mode of Key Distribution in Multimedia Internet KEYing (MIKEY). RFC 4738 (Proposed Standard), November 2006. [39] Joan Daemen and Vincent Rijmen. The Design of Rijndael: AESThe Advanced Encryption Standard. Springer-Verlag, 2002. [40] ISO/IEC. Information technology security techniques code of practice for information security management, June 2005.
72
Bibliography
[41] E. Rescorla N. Modadugu. Extensions for dtls in low bandwidt environments. draftrescorla-tls-partial-00, October 2005. [42] E. Rescorla. Tls partial encryption mode. draft-rescorla-tls-partial-00, October 2005. [43] Certicom T. Kause A. Kapoor, R. Tschalar. Internet x.509 public key infrastructure transport protocols for cmp. Internet-Draft, feb 2004. http://tools.ietf.org/id/draftietf-pkix-cmp-transport-protocols-05.txt. [44] H. Tschofenig J. Fischl. Session initiation protocol (sip) for media over transport layer security (tls), February 2006. [45] H. Tschofenig J. Fischl. Session description protocol (sdp) indicators for datagram transport layer security (dtls). draft-schl-mmusic-sdp-dtls-00, February 2006. [46] The openssl project. http://www.openssl.org. [47] The gnu ccrtp library. http://www.gnu.org/software/ccrtp/. [48] The twinkle softphone project. http://www.twinklephone.com/. [49] Header le for the au-le format. http://www.opengroup.org/public/pubs/external/auformat.html. [50] Gnuplot. http://www.gnuplot.info/. [51] International Telecommunication Union. Recomendation G.114 - One-way Transmission Time. Series G: Transmission Systems and Media, Digital Systems and Networks, May 2003.
73

RTP Over Tls

Încărcat de

Informații document

Descriere originală:

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

RTP Over Tls

Încărcat de

Drepturi de autor:

Formate disponibile

Georg-August-Universitt Gttingen Zentrum fr Informatik

ISSN 1612-6793 Nummer GAUG-ZFI-BM-2007-28

RTP over Datagram TLS

Computer Networks Group

RTP over Datagram TLS

Betreut durch Prof. Dr. Fu Computer Networks Group Georg-August-Universitt Gttingen

1.1 1.2 1.3

2.1 2.2 2.3 2.4

Security Considerations for VoIP

RTP over DTLS

Analysis of Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . System Idea/Intent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.1 DTLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

7.2 7.3 7.4 7.5

8.1 8.2 8.3 8.4 8.5 8.6 8.7 8.8

Conclusion and Future Work

Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Future Work and Open Issues . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.3 Thesis Organisation

2.1 Voice over IP

2.2 Real Time Transport Protocol

Figure 2.1: Strukture of an RTP packet

2.3 SSL/TLS and DTLS

Figure 2.3: DTLS in the TCP/IP stack

Figure 2.4: DTLS packet struckture

Figure 2.5: DTLS state machine [2]

2.4 Session Initiation Protocol SIP

Figure 2.6: Initialisation of a SIP session

3.1 Security in VoIP

3.1.1 Internet Protocol Security, IPsec

Figure 3.1: IPsec in the TCP/IP stack

Figure 3.2: Structure of an IPsec packet with AH

Figure 3.3: Structure of an IPsec packet with ESP

3.1.2 Comparison between IPsec and DTLS

3.2 Secure Real Time TransportProtocol

4 Security Considerations for VoIP

4 Security Considerations for VoIP

4.1.1 Condentiality in VoIP

4 Security Considerations for VoIP

4.1.2 Availability in VoIP

4.2 Threats and Attacks

4 Security Considerations for VoIP

5 RTP over DTLS

5.1 Introduction to RTP over DTLS

5 RTP over DTLS

Figure 5.1: Struckture of an RTP packet sent over DTLS

5.1.1 SRTP Compatibility Mode

5 RTP over DTLS

5.1.2 Packet size Comparison

5.1.3 Security Considerations

6.1 Analysis of Requirements

6.2 System Idea/Intent

6.2.3 SIP Softphone

6.3 RTP over DTLS

6.4 Choice of Libraries

6.4.3 Twinkle Softphone

7.1.2 Socket Initialisation

7.1.3 Session Initialisation with ccRTP

7.1.4 Sending Data

7.1.5 Receiving Data

7.1.6 Closing Sessions

7.1.7 Types of Sessions

7.2 SIP Session Initiation with Twinkle

Figure 7.1: Implementation status after phase 1

7.3 Implementation Process