Sunteți pe pagina 1din 141

UMTS Gi Interfacing and

Measuring System

Dimitris Tsaimos

Master of Science Thesis


Stockholm, Sweden 2006

ICT/ECS-2006-04
UMTS Gi Interfacing and Measuring System

December 2005

Dimitris Tsaimos
<dtsaimos@keletron.com>

Industrial Supervisor
Dr Nikos Mouratidis, Technical Manager, Systems Design Group,
Keletron LTD

Academic Supervisor and Examiner


Dr Axel Jantsch, Department of Microelectronics and Information
Technology, Royal Institute of Technology
Institute for Microelectronics and
UMTS Gi Iinterfacing and Measuring System
Information Technology

Abstract
Mobile networks test tools have always played an important role in the evolution of
mobile networks and the deployment of new standards and services. Their major
advantage is the fact that they provide a common ground to networking equipment
vendors and network operators to examine their equipment and/or network topology
behavior under various circumstance.

The work performed in the particular thesis project is an integral part of a larger Keletron
objective, relating to the development of a test tool/terminal equipment product line
based on the Compact PCI system interface, and targeted for UMTS (3G)
implementations. Specifically, the goal of the thesis project was the implementation of
the interfacing and measuring system for the Gi interface, which provides the
interconnection between the Gateway GPRS Support Node (GGSN) and the external
packet data network.

For this purpose two systems were implemented, one related to traffic generation and
one to traffic monitoring and statistics gathering. Each system was realized in a Virtex-II
Field Programmable Gate Array device that includes the test tool. Parameterized
architectures were designed and implemented for both the traffic generation and the
traffic monitoring modules, using the VHSIC Hardware Description Language (VHDL).
The final instances of the architectures were decided based on the test tool throughput
requirements and went through the simulation, synthesis and implementation process in
order to verify their functionality and performance.

-2-
Dimitris Tsaimos Msc Thesis Royal Institute of Technology

Sammanfattning
Test verktyg för mobila nätverk har alltid spelad en viktig roll för utveckling av mobila
nätverk och spridande av nya standard och tjänster. Deras fördel är att de bilda en
gemensam grund till utrustnings leverantörer och operatörer.

Syftet i denna föreliggande rapporten är en integral del av ett store Keletron mål, som är
utvecklingen av ett test verktyg och produkt baserat på den Compact PCI system
gränssnitt, och är också mål för UMTS (3G). Särskilda målet av projektet var
realiserandet av gränssnitt och mättande system för Gi gränssnittet, som är förbindelsen
mellan Gateway GPRS Support Node (GGSN) och den yttre datanätverken.

Två olika implementationer genomfördes: en trafik genererings system och en för trafik
kontroll och samlingen av statistic information. Varje system realiserades i en Virtex-II
FPGA krets. Arkitekturen är konfigurerbar och är utvecklad för båda trafik genereringen
och kontrollen. Implementeringen möter krav från test verktyget och verifierades genom
syntes och simulation.

-3-
Institute for Microelectronics and
UMTS Gi Iinterfacing and Measuring System
Information Technology

Acknowledgments
This thesis was performed in Keletron LTD in Thessaloniki, within the Systems Design
Group division. I would like to express my deepest thanks to my supervisor in Keletron,
Dr. Nikos Mouratidis for his help throughout this project both in terms of architectural
concepts and VHDL programming. Additionally, a great thanks to Dr. Spiros Tombros for
all the hours we spent discussing about mobile networks testing. Both of them have
always been there whenever I needed them, sharing their knowledge and making this
project a valuable learning experience.

I would also like to thank Dr. Axel Jantsch for his support, advice and feedback
throughout this project.

A great thanks to my family who has always been there for me, making me believe in my
goal and helping me remain focused at times I really needed to. Hari, Koula, Alex this
project owes a great deal to you all.

Last but not least, words alone seem very poor to thank Roula for her support and for
the endless hours she spent on the couch waiting for me to finish a thesis chapter. Her
smile gives me the strength to keep walking…

Thessaloniki,
January 30th 2006

-4-
Dimitris Tsaimos Msc Thesis Royal Institute of Technology

Contents
1. Introduction ............................................................................................................. 11
1.1 3rd Generation Partnership Project .................................................................. 11
1.2 GPRS and UMTS ............................................................................................ 11
1.3 Mobile Networks Test Tools ............................................................................ 12
1.4 Test Tools Measurement Approaches ............................................................ 13
1.5 UMTS Network Architecture - High Level View ............................................... 14
1.6 Test Tools Operating Principles ...................................................................... 15
1.7 Thesis Overview.............................................................................................. 16
2. UMTS and GPRS Networks.................................................................................... 18
2.1 UMTS Building Elements ................................................................................ 18
2.2 The GPRS Network ......................................................................................... 20
2.2.1 Packet Routing and Transfer Functions .................................................. 21
2.2.2 GPRS Support Nodes ............................................................................. 23
2.3 GPRS Session Management and Routing ...................................................... 24
2.3.1 Packet Data Protocol Address Assignment............................................. 24
2.3.2 Packet Data Protocol Context ................................................................. 25
2.3.3 Packet Routing Process .......................................................................... 26
2.4 UMTS - IP Networks Interworking ................................................................... 26
2.4.1 Gi Protocol Stack..................................................................................... 27
2.4.2 Point-to-Point Protocol............................................................................. 27
2.4.3 Layer 2 Tunneling Protocol...................................................................... 28
2.4.4 Packet Transfer Procedure...................................................................... 28
2.5 UMTS Gi Interfacing and Measuring System .................................................. 30
2.6 Additional IP-related Functionality................................................................... 30
3. Test Tool Functional Principles ............................................................................... 31
3.1 Keletron Test Tool Functionality...................................................................... 32
3.2 Building Components ...................................................................................... 32
3.3 Configuration Process ..................................................................................... 33
3.3.1 Traffic and User Profiles .......................................................................... 33
3.3.2 Configuration Files................................................................................... 34
3.3.3 Test Scenarios Creation .......................................................................... 35
3.4 Simulation Time and Ticks .............................................................................. 35
3.5 Test Tool Hardware Architecture..................................................................... 36
3.5.1 Master Board ........................................................................................... 36
3.5.2 Rear Transition Modules ......................................................................... 37
3.6 Ethernet........................................................................................................... 38
3.6.1 IEEE802 Data Link Sublayers ................................................................. 38
3.6.2 Medium Independent Interface................................................................ 39
3.7 IXF440 Multiport 10/100Mbps Ethernet Controller .......................................... 39
4. Hardware Timer Management ................................................................................ 43
4.1 Timer Management ......................................................................................... 43
4.2 Theoretical Background .................................................................................. 44
4.2 Timer Management Data Structures ............................................................... 44
4.2.1 Unordered Data Structures...................................................................... 45
4.2.2 Ordered Data Structures ......................................................................... 45
4.2.3 Hashed Data Structures .......................................................................... 45

-5-
Institute for Microelectronics and
UMTS Gi Iinterfacing and Measuring System
Information Technology

4.2.4 Hierarchical Data Structures.................................................................... 45


4.3 Timer Management Hardware Architectures................................................... 46
4.3.1 Heap-based ............................................................................................. 46
4.3.2 CAM-based.............................................................................................. 47
4.4 Priority Queuing and Timer Management ....................................................... 48
4.5 Test Tool Timer Management ......................................................................... 49
4.5.1 Timer Wakeup Times vs. Counters ......................................................... 49
4.5.2 Distributed Timer Management ............................................................... 50
4.5.3 Distributed Heaps Timer Management.................................................... 51
4.5.4 CAM-based Timer Management ............................................................. 53
4.5.5 Timer Manager Architecture ................................................................... 54
5. Field Programmable Gate Arrays............................................................................ 55
5.1 Test Equipment Hardware Flexibility ............................................................... 55
5.2 Field Programmable Gate Arrays.................................................................... 55
5.2.1 FPGA architectures ................................................................................. 55
5.3 FPGAs Building Modules ................................................................................ 56
5.3.1 Logic Blocks ............................................................................................ 56
5.3.2 Embedded RAM and Multipliers .............................................................. 57
5.3.3 Clocking Scheme..................................................................................... 58
5.3.4 Clock Managers....................................................................................... 59
5.3.5 General-purpose I/O Pins........................................................................ 60
5.3.6 Gigabit Transceivers................................................................................ 60
5.3.7 Embedded Processor Cores ................................................................... 61
5.3.8 Programmable Interconnects .................................................................. 61
5.4 Xilinx Virtex-II FPGAs...................................................................................... 61
5.4.1 Configurable Logic Blocks ....................................................................... 61
5.4.2 Block SelectRAM..................................................................................... 62
5.4.3 Multipliers ............................................................................................... 62
5.4.4 Clocking................................................................................................... 63
5.4.5 Input/Output Blocks ................................................................................. 63
5.4.6 Routing Resources .................................................................................. 63
5.5 Test Tool Virtex-II Device ................................................................................ 64
6. Gi Interfacing and Measuring .................................................................................. 66
6.1 OSI Reference Model, Encapsulation and Decapsulation .............................. 66
6.2 Gi Interfacing and Measuring System ............................................................. 67
6.2.1 Configuration and Initialization ................................................................ 68
6.3 Information Organization and Memory Addressing ......................................... 69
6.4 Configurable Architectures .............................................................................. 70
6.5 Generator Hardware Architecture ................................................................... 70
6.5.1 Building Blocks ........................................................................................ 71
6.5.2 Ticks Counter .......................................................................................... 72
6.5.3 Traffic Profile Engine ............................................................................... 73
6.5.4 User Profile Engine.................................................................................. 74
6.5.5 Transmit Scheduler ................................................................................. 75
6.5.6 Transmit Data Pump................................................................................ 76
6.6 Analyzer Hardware Architecture...................................................................... 78
6.6.1 Building Blocks ........................................................................................ 79
6.6.2 Receive Data Pump................................................................................. 80
6.6.3 Statistics Fields Manager ........................................................................ 81
6.6.4 Statistics Fields Parser ............................................................................ 82

-6-
Dimitris Tsaimos Msc Thesis Royal Institute of Technology

6.6.5 Statistics Engine ...................................................................................... 84


6.7 Statistics Fields and Counters ......................................................................... 84
6.7.1 Internet Protocol Statistics....................................................................... 85
6.7.2 User Datagram Protocol Statistics........................................................... 86
6.7.3 L2TP Statistics......................................................................................... 86
6.7.4 PPP statistics........................................................................................... 87
6.7.5 User Internet Protocol Statistics .............................................................. 88
6.7.6 Ethernet statistics .................................................................................... 88
6.7.8 Statistics Gathering Example .................................................................. 90
7. Architectural Optimization ...................................................................................... 92
7.1 Performance Requirements ............................................................................ 92
7.1.1 Throughput Requirements....................................................................... 92
7.1.2 Test Tool Constraints .............................................................................. 92
7.2 Throughput Analysis........................................................................................ 92
7.3 Generator Tuning ............................................................................................ 94
7.4 Analyzer Tuning .................................................................................................... 95
8. Implementation and Testing.................................................................................... 96
8.1 HDL-based Design Flows................................................................................ 96
8.2 Xilinx Integrated Software Environment .......................................................... 96
8.3 Xilinx Implementation Process ........................................................................ 97
8.4 Implementation Goals ..................................................................................... 97
8.5 Electronic Design Automation Tools ............................................................... 98
8.6 Testing............................................................................................................. 98
8.6.1 Testing Architecture ....................................................................................... 99
8.6.2 Generator Verification.............................................................................. 99
8.6.3 Analyzer Verification.............................................................................. 100
8.6.4 Test Data Acquisition............................................................................. 100
8.7 UMTS QoS classes ....................................................................................... 101
8.8 Example Applications, Data Rates and Packet Sizes ................................... 102
8.7 Simulation...................................................................................................... 104
8.8 Synthesis....................................................................................................... 105
8.8 Implementation.............................................................................................. 106
8.9 Architecture Throughput Limitations.............................................................. 107
9. Epilogue .................................................................................................................... 110
9.1 Conclusions................................................................................................... 110
9.2 Future Work................................................................................................... 110
Appendix A.................................................................................................................... 112
A.1 Asynchronous Transfer Mode Protocol ......................................................... 112
A.2 ATM Service Categories ............................................................................... 113
A.3 ATM Adaptation Layers................................................................................. 114
A.4 ATM and the UMTS Network ........................................................................ 115
Appendix B.................................................................................................................... 117
B.1 GSM Phase 2+ Core Network Basic Entities and Interfaces......................... 117
B.2 CS and PS Domains Common Entities ......................................................... 117
B.3 CS Domain Entities ....................................................................................... 118
B.4 PS Domain Entities ....................................................................................... 119
B.5 CS and PS Domain Common Interfaces ....................................................... 119
B.6 CS Domain Interfaces ................................................................................... 119
B.7 PS Domain Interfaces ................................................................................... 120
B.8 External Networks Interfaces ........................................................................ 120

-7-
Institute for Microelectronics and
UMTS Gi Iinterfacing and Measuring System
Information Technology

Appendix C ................................................................................................................... 123


C.1 PCI Industrial Computer Manufacturers Group ............................................. 123
C.2 Compact Peripheral Component Interface .................................................... 123
C.3 cPCI vs PCI ................................................................................................... 124
C.4 cPCI System Architecture ............................................................................. 126
C.4.1 cPCI Enclosures .................................................................................... 127
C.4.2 Backplane.............................................................................................. 128
C.4.3 Single Board Computer ......................................................................... 130
C.4.4 Peripheral Boards.................................................................................. 131
C.5 cPCI bus........................................................................................................ 131
C.6 Hot Swap....................................................................................................... 132
C.7 cPCI Benefits Summary ................................................................................ 135
References.................................................................................................................... 137

-8-
Dimitris Tsaimos Msc Thesis Royal Institute of Technology

Figures
Figure 1-1 Protocol stack emulation/simulation .............................................................. 14
Figure 1-2 UMTS network architecture high-level view................................................... 15
Figure 2-1 UMTS Universal Terrestrial Radio Access Network ...................................... 20
Figure 2-2 Intra and Inter-network backbones ................................................................ 24
Figure 2-3 PDP context activation procedure ................................................................. 25
Figure 2-4 Typical Gi interface protocol stack................................................................. 27
Figure 2-5 Packet transfer and related protocol stacks................................................... 29
Figure 3-1 Test tool application configuration sequence and operation.......................... 34
Figure 3-2 Ethernet frame structure ................................................................................ 39
Figure 3-3 Test tool Gi system block diagram................................................................. 42
Figure 4-1 Timer state transition diagram ....................................................................... 44
Figure 4-2 Timers organized in a heap with degree d=3 ................................................ 47
Figure 4-3 CAM-based timer management..................................................................... 48
Figure 4-4 Distributed timer management architecture................................................... 50
Figure 4-5 Timer manager heap organization................................................................. 52
Figure 5-1 LUT logic function implementation................................................................. 57
Figure 5-2 Clock tree example ........................................................................................ 58
Figure 5-3 Clock manager frequency synthesis and phase shifting................................ 59
Figure 5-4 Clock de-skewing and jitter removal .............................................................. 60
Figure 5-5 Virtex-II routing resources.............................................................................. 63
Figure 5-6 Xilinx hierarchical routing resources .............................................................. 64
Figure 6-1 The concept of encapsulation........................................................................ 66
Figure 6-2 Transmit traffic events, user profiles and traffic profiles ................................ 67
Figure 6-3 Generator block diagram ............................................................................... 72
Figure 6-5 Analyzer block diagram ................................................................................. 79
Figure 6-7 Gi protocol stack incorporating L2TP and PPP ............................................. 85
Figure 6-6 Internet Protocol header ................................................................................ 90
Figure 6-7 Packet header word instance ........................................................................ 90
Figure 8-1 Xilinx design flow ........................................................................................... 97
Figure 8-2 Generic test bench block diagram ................................................................. 99
Figure 8-3 Packet header collection setup.................................................................... 101
Figure A-1 AAL2 vs AAL5 process................................................................................ 114
Figure A-2 UTRAN ATM protocol stacks ...................................................................... 116
Figure B-1 UMTS network architecture......................................................................... 122
Figure C-1 3U, 6U cPCI format and corresponding connectors................................... 126
Figure C-2 cPCI backplane with 7 peripheral slots ....................................................... 129
Figure C-3 cPCI bus architecture.................................................................................. 132

-9-
Institute for Microelectronics and
UMTS Gi Iinterfacing and Measuring System
Information Technology

Tables
Table 5-1 Block SelectRAM single and dual-port configurations .................................... 62
Table 6-1 IP header fields and related counters ............................................................. 85
Table 6-2 UDP header fields and related counters ......................................................... 86
Table 6-3 L2TP header fields and related counters ........................................................ 86
Table 6-4 PPP header Protocol field values ................................................................... 87
Table 6-5 PPP header fields and related counters ......................................................... 88
Table 6-6 IP header fields and related counters ............................................................. 88
Table 6-7 IXF440 status word ......................................................................................... 89
Table 6-8 IXF440 status word fields and related statistics.............................................. 89
Table 6-9 IP statistics example ....................................................................................... 91
Table 8-1 UMTS packet-based applications and data rates ......................................... 102
Table 8-2 UMTS multimedia applications characteristics ............................................. 103
Table 8-3 Multimedia applications and timer durations................................................. 103
Table 8-4 Simulation scenario....................................................................................... 104
Table 8-6 Generator synthesis results .......................................................................... 106
Table 8-7 Analyzer synthesis results ............................................................................ 106
Table 8-8 Generator implementation results................................................................. 107
Table 8-9 Analyzer implementation results ................................................................... 107

- 10 -
Dimitris Tsaimos Msc Thesis Royal Institute of Technology

1. Introduction
1.1 3rd Generation Partnership Project
The 3rd Generation Partnership Project (3GPP) [1] is responsible for the formation of the
3rd generation of mobile telecommunications system1 specifications. Its scope is to
define an open standards mobile network architecture, which will ensure global roaming
and circulation of terminals. 3GPP was the result of a collaboration agreement2 between
a number of telecommunications standards bodies, such as the European
Telecommunications Standards Institute and the China Communications Standards
Association, referred to as Organizational Partners. Once defined, the 3GPP technical
specifications are transposed by the Organizational Partners into standards.
Furthermore, the 3GPP is also responsible for the development and maintenance of
evolved radio access technologies related to packet based services - e.g. General
Packet Radio Services and Enhanced Data rates for GSM Evolution.

1.2 GPRS and UMTS


The General Packet Radio Services (GPRS) network was a breakthrough in mobile
telecommunications, due to the fact that it allowed direct access to Packet Data
Networks (PDNs), such as the Internet or proprietary private packet-switched networks.
Prior to GPRS, mobile subscribers only had access to circuit-switched services, which
are inherently related to voice transfer. The revolution of GPRS is what is known as the
“always on” connectivity; mobile subscribers are constantly connected to the packet data
network, but only consume bandwidth when they request packet-switched services.
GPRS enables mobile subscribers to access both voice and data services anytime,
anyplace.

GPRS allowed network operators to offer value-added packet-based services to mobile


subscribers, the most important ones being web browsing and electronic mail. Though
GPRS was a very good starting point for the introduction of this kind of services in
telecommunication mobile networks, it was not good enough due to its bandwidth
limitations. The aforementioned fact in conjunction with technology advances, especially
in the area of multimedia, constituted GPRS as a bottleneck in the range of services
network operators and service providers wished to deploy in mobile networks.

The Universal Mobile Telecommunications System (UMTS) - also referred to as 3G -


was the next step in the evolution of mobile communications, addressing GPRS
bandwidth limitations. The ultimate goal of UMTS is to accelerate convergence and
integration between telecommunications, Information Technology and service providers
in order to deliver high capacity mobile communications at a low cost. The ambition of
UMTS is to achieve data rates as high as 2Mbps per subscriber, facilitating the
deployment of mobile multimedia services such as MP3 downloading and video
1
The 3rd generation of mobile telecommunications system is known as Universal Mobile
Telecommunications System (UMTS).
2
The collaboration agreement was established during a meeting held in Copenhagen between December
2-4.

- 11 -
Institute for Microelectronics and
UMTS Gi Iinterfacing and Measuring System
Information Technology

streaming, as well as mobile office applications. As a matter of fact, during the last few
years, a large number of 3G networks have been deployed and a plethora of services,
including the previously mentioned, are being offered to subscribers.

One of the major strengths of UMTS is that it does not define an entirely new
architecture, but incorporates elements from existing architectures instead, mainly
GPRS in the packet switched domain and Global System for Mobile Communications
(GSM) in the circuit switched domain. Differentiation comes in the air interface and the
data transmission technology. This approach enables operators to build 3G networks on
top of their existing infrastructure at an improved cost-efficiency.

1.3 Mobile Networks Test Tools


Test tools have always played an important role in the deployment of mobile networks.
Mobile networks evolution was based on the introduction of new interfaces and
protocols, along with corresponding new network elements which implemented them.
The aforementioned interfaces and protocols establish an open standards architecture,
thus facilitating the implementation of a mobile network using telecommunication
equipment from different vendors. However, with an open standards architecture comes
the issue of equipment vendors translating the specifications in varying ways and
therefore ending up with different implementations.

One of the main uses of test tools is to provide a common ground for vendors to test the
conformance of their equipment to the standards, thus allowing the seamless integration
of physical devices originating from different vendors. Additionally, test tools have
traditionally been used for equipment stress testing and performance evaluation. Stress
testing refers to the generation of network traffic up to the actual link rate, in order to
estimate the throughput of the device along with its behavior under heavy loads. On the
other hand, performance evaluation refers to equipment errors identification along with
real-time Quality of Service (QoS) measurements for performance benchmarking. The
types of QoS measurements are usually decided within the context of a particular
application; round-trip delay times and packet loss percentage are the ones commonly
used.

In general, a “good” test tool is distinguished by the following characteristics

• compliance with all standards it supports, so as to be able to resolve


interoperability issues between vendors,

• reliability and the ability to reproduce previous outputs, in order to allow operators
to repeat test scenarios,

• flexibility, which will enable operators on the one hand to change protocol
versions, stacks and test scenarios and on the other hand to adapt the
instrument to evolving standards,

• an easy to understand and use configuration environment, hiding hardware


complexity from operators and a number of advanced analysis functions, i.e.
intelligent filters to isolate particular types of traffic.

- 12 -
Dimitris Tsaimos Msc Thesis Royal Institute of Technology

Test tools incorporating the aforementioned features can assist equipment vendors to
accurately verify their products functionality and performance and network operators to
integrate equipment coming from different manufacturers. Furthermore, they provide
operators with a valuable network planning utility, giving them the opportunity to examine
network behavior and performance under various configurations.

1.4 Test Tools Measurement Approaches


Test tools provide a variety of features and can create customized test scenarios, based
on the needs of the system to be tested. The basic measurement principles they
incorporate are [2]

• Monitoring, which is applied whenever there is a need for collection of


measurements in a living network.

• Simulation/Emulation, during which the test tool imitates the behavior of an active
network entity, usually sending and receiving traffic from/to other network entities
and collecting information requested by the operator.

Monitoring is commonly translated into the process of collecting data from a network
interface. It has always been a valuable tool for vendors and network administrators,
mainly used to measure the actual performance and improve behavior predictions of the
system under test. Data collected throughout the monitoring interval are processed to
produce statistics - for example the number of occurrences of a particular event - and
can be stored for further analysis. A very useful complement to monitoring is data
filtering, which enables the operator to isolate specific data streams, i.e. all traffic
destined for a particular mobile subscriber.

Simulation is the concept of imitating a real device, by representing certain features of


the behavior of the device. In a test environment, simulators are used to produce desired
conditions. For instance, our test tool acts as a simulator of the Public Data Network
towards the GPRS Gateway Support Node (GGSN), in order to examine its reaction to
faulty messages amongst other things. In the above example, the desired condition is
the transmission of faulty messages to the GGSN. Simulation is commonly used to
substitute network elements or even whole parts of a network during the network
development process. Emulation also imitates a real device, but is different than
simulation in that it does not attempt to precisely model the device. The emulation
objective is to reproduce the device behavior. Within the context of test tools emulation
is a higher form of simulation, where particular features of a device are simulated
automatically and in conformance with standards. In order for a particular feature to be
simulated, test tools emulate all features the simulated one depends on. An example of
the simulation/emulation concept is protocol implementation testing. Protocol stacks
consist of layers, with higher layers using the services offered by the lower ones.
Therefore, the implementation of a specific layer protocol can be tested by simulating the
specific protocol and emulating the ones it is built on (Figure 1-1).

- 13 -
Institute for Microelectronics and
UMTS Gi Iinterfacing and Measuring System
Information Technology

simulated
Layer x protocol

Layer x-1
protocol
emulated
Layer x-2
protocol
Layer x-3
protocol

Figure 1-1 Protocol stack emulation/simulation

Simulation is used to produce the “desired conditions” for the particular protocol,
whereas emulation isolates the protocol from the details of the underlying protocols.

Typically, simulation/emulation is used in conjunction with monitoring, in order to collect


statistics which represent the reaction of the system under test to the specified
conditions. When dealing with mobile networks testing, the system under test is a
network element. Network elements are physical pieces of equipment that implement
certain aspects of the network functionality and use standard interfaces to communicate
with other network elements. The network architecture defines both the functionality and
the interfaces a particular element should implement.

1.5 UMTS Network Architecture - High Level View


Since our thesis project is related to UMTS, we shall provide an abstract view of the
UMTS network architecture to further understand the functionality of UMTS network
elements and the way test tools imitate their behavior.

The basic concepts of the UMTS architecture are functional groups and reference
points. A functional group encloses a number of functions each one implementing part of
a set of services, whereas reference points are conceptual points separating functional
groups. All functional groups present in the UMTS network architecture cooperate in
order to provide the full range of UMTS services, i.e. circuit and packet switching,
roaming and mobility management, short messaging, etc. Since the intelligence for the
particular services is distributed to the mobile network, a functional group inter-
communication mechanism is necessary. This is realized over the network reference
points, using well-defined interfaces. In the case of UMTS, functional groups consist of
network elements - the actual physical equipment - and interfaces are formed by
protocol stacks running on the physical equipment. The functions of a functional group
may be performed by one or more network elements.

There are three functional groups in a UMTS network and a quite large number of
reference points [3]. For the purposes of our discussion we will only present the basic
ones. The first functional group is realized by the User Equipment (UE). The UE is the
actual handheld device and includes functions such as radio termination and
transmission from/to the network elements, user authentication on the subscriber side,
mobility management, hardware management and the hosting of user applications. The
second functional group consists of the Access Network (AN). The AN handles the

- 14 -
Dimitris Tsaimos Msc Thesis Royal Institute of Technology

operation and maintenance of the radio access network. Its major responsibility is to
allocate and manage “transmission/reception channels” to mobile subscribers as well as
to perform the specific functions of the access technique. The Core Network (CN) is the
functional group which provides support for the actual network features and
telecommunication services. CN duties include management of user location information
- utilized by roaming services - and the switching and transmission mechanisms used to
transfer signaling and/or user generated information.

Uu Iu Gi

User Equipment Access Network Core Network External PDNs

Figure 1-2 UMTS network architecture high-level view

These mechanisms are differentiated in circuit and packet switched networks, leading to
a further functional separation of the CN into a Circuit-Switched (CS) domain and a
Packet-Switched (PS) domain. The CS domain is delegated services related to voice
transfer and the PS domain handles services related to data transfer. The CN is also
interconnected with external packet and circuit switched networks in order to enrich the
range of services offered to mobile subscribers.

All of the aforementioned functional groups communicate with each other over the
conceptual points (Figure 1-2). The Uu is the reference point between the UE and the
AN, whereas the Iu is the one between the AN and the CN. Finally, the CN
communicates with external packet networks over the Gi reference point. Functions of
functional groups are implemented in network entities which are defined in the UMTS
specification, along with the protocol stacks that they should utilize. As functionality is
distributed in network entities, the actual UMTS architecture is far more complex than
the one presented here, incorporating interfaces between entities of the same and
different functional groups. For example, the separation of the CN into a PS domain and
a CS domain results in additional interfaces being defined for intercommunication of
entities belonging to the different domains. Presenting the whole range of conceptual
points and entities is outside the scope of this document. However, throughout the
second chapter we shall present the UMTS basic network entities and their
corresponding interfaces, focusing on the GPRS Gateway Support Node and the Gi
interface.

1.6 Test Tools Operating Principles


Test tools functionality can be briefly summarized as the “impersonation” of a network
entity based on the simulation of

• the functions the particular entity should provide ,

- 15 -
Institute for Microelectronics and
UMTS Gi Iinterfacing and Measuring System
Information Technology

• the interfaces of the entity to the network element under test

In essence, test tools use a combination of simulation and emulation to imitate the
behavior of a network element. The network entities under test realize the existence of
the elements they communicate with through their interfaces and the messages they
exchange over them. The messages structure is defined by means of protocols, having
each protocol services using lower layer protocols. The result is that the reference points
of the network architecture are implemented as protocol stacks. Test tools “impersonate”
network entities by simulating the protocol stack the entity would use to communicate
with the device under test. They are able to construct and transmit as well as receive
and parse packets utilizing the particular protocols. The basic functionality of a test tool
is limited to the formation and transmission of packets in the egress side and the
collection of statistics based on packets received in the ingress side. However, there are
more sophisticated test tools which allow the simulation of a complete call establishment
message sequence. In order to deliver such functionality, they generate packets
according to the packets they receive during their “dialog” with the device tested. The
aim of the test equipment is not to provide the functions of the device it impersonates but
to simulate its external behavior instead.

1.7 Thesis Overview


The scope of the present thesis project was to design and implement a UMTS Gi
interfacing and measuring system to be used for networking equipment and network
links stress testing and networking equipment protocol conformance. The system should
be able to provide four 100Mbps network links with constant bit rate traffic in the egress
side, as well as maintain statistics on the aggregate 400Mbps network traffic in the
ingress side. Additionally, the overall architecture should be configurable in terms of
hardware resources utilization and total bandwidth serving, due to the fact that there are
plans to utilize the Gi system in more demanding environments.

With respect to the UMTS network architecture, the system should simulate the
functionality of a Public Data Network towards a GPRS Gateway Support Node (GGSN),
utilizing the protocol stack of the Gi interface. The particular system will be embodied to
a commercially available UMTS test tool, developed by Keletron. The Keletron test tool -
from now on referred to as “test tool” - functionality is not limited to the emulation of the
Gi interface. It has been designed in a modular fashion to easily and effectively facilitate
operation in a variety of UMTS related applications as well as locations within the overall
network topology. The Gi system will be part of a range of systems offering similar
features for a number of interfaces.

The test tool incorporates flexibility in its hardware architecture, which is necessary to be
highly customizable and configurable. The need to emulate various interfaces of the
UMTS network requires the capability to accommodate a wide range of protocols and
different data transfer mechanisms. A key element to achieving such flexibility is the use
of Field Programmable Gate Arrays (FPGAs). FPGAs are reconfigurable hardware,
which can be tailored to the needs of the target application, thus avoiding the design of
different versions of the hardware for each particular application. The test tool
incorporates two FPGAs, which belong to the Virtex-II family of Xilinx FPGAs. They

- 16 -
Dimitris Tsaimos Msc Thesis Royal Institute of Technology

contain all hardware logic required to construct and transmit data and signaling packets,
whose format is based on the protocol stack emulated. All hardware logic required to
construct and transmit data and signaling messages for the various interfacing and
measuring systems, including the Gi, is implemented on the two FPGAs present on the
test tool board.

Our hardware design was implemented using the Very high speed integrated circuits
Hardware Description Language (VHDL). Following, the implementation was simulated
to examine the system behavior and went through the synthesis procedure, in order to
be available in a format suitable for FPGAs place-and-route tools to process. Finally, the
design was placed and routed in the target FPGA, verifying that the overall requirements
in terms of FPGA area occupation and speed were met.

The objective of this document is to provide the reader with an understanding of the
overall workings of the test tool and the system designed in particular. Therefore, more
emphasis will be put on general principles of the hardware architecture, leaving aside
specialized implementation details. Chapter 2 contains an overview of the UMTS and
GPRS network architecture, focusing on the interworking between GPRS and IP-based
networks. In chapter 3 we start digging deeper into the test tool, presenting an overview
of its configuration process and hardware architecture. Chapter 4 contains an overview
and comparison of timer management architectures studied for the purposes of the
thesis. Chapter 5 is related to FPGAs, presenting the wide range of features they offer. It
is an introductory chapter to the implementation of our design on the Xilinx FPGA.
Chapter 6 describes a parameterized Gi system egress and ingress side hardware
design, whereas chapter 7 analyzes the test tool constraints and requirements,
concluding with the instance of the system hardware architecture. Chapter 8 describes
the Gi system verification process along with the results of the thesis project. Finally,
chapter 9 presents the overall conclusions of the work performed along with future
considerations.

Last but not least, let us note that we will include a brief description of standards
committees where appropriate. Standardization organizations are the main enablers of
open architectures, upon which the world of communications is based. Open
architectures establish and promote interoperability. The task of the standardization
groups is to design and approve specifications globally applicable in order to isolate
implementation details from the external behavior of an entity. We have already
presented such a standardization committee in this chapter, the 3rd Generation
Partnership Project.

- 17 -
Institute for Microelectronics and
UMTS Gi Iinterfacing and Measuring System
Information Technology

2. UMTS and GPRS Networks


2.1 UMTS Building Elements
The UMTS network is not an entirely new architecture. It utilizes and expands the
functionality of already deployed networks, mainly the Global System for Mobile
communications (GSM) and the General Packet Radio Services (GPRS) networks. GSM
was one of the many 2nd Generation (2G) cellular systems that emerged during the
1990s as an evolution of 1G systems, which were only capable of transmitting analog
voice data. 2G systems introduced security features such as voice encryption and fraud
protection, as well as short messaging services. Amongst the 2G standards, GSM was
the most successful one supporting more than half the world’s mobile subscribers with
international roaming in 140 countries and 400 networks [2]. The GSM standard is
developed and maintained by the European Telecommunications Standards Institute.

The original release of the GSM standard apart from security and short messaging
services, provided very few supplementary services the most important of which being
the ability to transmit data at rates up to 9.6 kbps. Phase 2 of GSM standardization
addressed this issue by incorporating supplementary services comparable to digital fixed
network Integrated Services Digital Network Standard standards. However, the actual
breakthrough was achieved with the Phase 2+ GSM specification, which introduced
important 3G features, e.g. high-data rate services, enhanced speech
compression/decompression, Intelligent Network (IN) services - Customized Application
for Mobile Enhanced Logic (CAMEL)3 - and new transmission principles for packet
switched services - GPRS, Enhanced Data rates for GSM Evolution (EDGE) and High
Speed Circuit-Switched Data (HSCD).

UMTS is in essence a successor to the GSM standard that is downward compatible with
GSM, using the GSM Phase 2+ enhanced Core Network incorporating GPRS and
CAMEL capabilities. The differentiation of UMTS comes in terms of a new radio access
network, referred to as UMTS Terrestrial Radio Access Network (UTRAN). The aim of
the UTRAN is to accommodate the new principles for air interface transmission utilized
by UMTS. GSM networks use a combination of Time Division Multiple Access and
Frequency Division Multiple Access, whereas UMTS introduced the Wide-band Code
Division Multiple Access. The CN of GSM Phase 2+ is used almost as is, requiring minor
modifications to interface with the UTRAN. Further information on the GSM Phase 2+
CN can be found in Appendix B.

UTRAN is subdivided into Radio Network Systems (RNSs). A RNS consists of base
station equipment - transceivers, controllers etc - and is responsible for allocation and
release of specific radio resources in order to establish a connection between a User
Equipment and the UTRAN. Therefore it provides the ability to Core Network elements
to communicate with mobile stations. The new network elements comprising RNS base
station equipment are

3
The main scope of the CAMEL subsystem is to provide the mechanisms to support services consistently,
even if the user is not located within his/her operator network.

- 18 -
Dimitris Tsaimos Msc Thesis Royal Institute of Technology

• the Radio Network Controller (RNC)

• Node B

The RNC enables autonomous radio resources management by UTRAN. Additionally, it


handles protocol exchanges between its interfaces and is responsible for centralized
operation and maintenance of the entire RNS. Practically speaking, the RNC controls
the use and reliability of the radio resources. One RNC may control one or more Node
Bs. The UMTS standard defines three types of RNCs

• Serving RNC (SRNC)


The RNC of the RNS the UE originally used to attach to the mobile network. The
SRNC is responsible for the user mobility within the UTRAN as well as providing
to the UE a point of connection towards the CN.

• Drift RNC (DRNC)


We say that a UE has drifted when it is handed onto a cell associated with a
different RNS than the one it originally attached to. The DRNC is the RNC of the
new RNS the UE has drifted to. Its responsibility is to route information between
the SRNC and the UE, acting like a switch.

• Controlling RNC (CRNC)


A CRNC handles the configuration of a Node B. When a UE wishes to access
the mobile network, it will send an access request to a Node B, which in turn will
forward it onto its CRNC.

Node B is the physical unit for radio transmission/reception within cells; it is the actual
termination point of the radio interface. Depending on the mobile network sectoring, one
or more cells may be served by a Node B. Its major duties include the conversion of
data from/to the radio interface, the measurement of the quality and strength of the
connection and the power control4 of the UE. Node B transmits periodic reports to its
CRNC; the CRNC translates the particular reports into appropriate control actions and
instructs Node B to perform them. A typical example of the aforementioned scenario is
power control, where Node B collects information on transmission power control from the
UE and enables it to adjust its power using transmission power control commands. The
Node B transmission power control commands are decided on a threshold basis, where
threshold values are determined by the RNC using information retrieved from Node B.

The UTRAN introduces four new reference points to enable communication between the
UE, Node B, RNC and the CN (Figure 2-1). Specifically, the reference points defined
are

• Uu, interface of the UE towards Node B.

• Iu, RNC to CN interface. Recall that the CN is further divided into a Circuit-
Switched and a Packet-Switched domain. Consequently, the Iu interface consists

4
Power control refers to the increase/decrease of the UE transmit power. The further a UE is from a Node
B, the more transmit power it has to dissipate in order to have access to the mobile network.

- 19 -
Institute for Microelectronics and
UMTS Gi Iinterfacing and Measuring System
Information Technology

of two distinct reference points; Iu-CS, for communication of the RNC with the CS
domain and Iu-PS, for communication of the RNC with the PS domain.

• Iub, RNC to Node B interface.

• Iur, RNC to RNC interface.

UTRAN

RNS

UE
Iub
UE Uu Node B Iu-CS CS domain
Iub
UE RNC

UE Node B
Iur Iu-PS
Iu-CS

UE
Iub PS domain
Uu Iu-PS
UE Node B PS domain
Iub
UE RNC

UE Node B

RNS

Figure 2-1 UMTS Universal Terrestrial Radio Access Network

The Uu incorporates the actual Wide-band Code Division Multiple Access (W-CDMA)
based radio interface, whereas the Iu, Iur and Iub interfaces are built on the
Asynchronous Transfer Mode transmission principles. Presenting the details of the W-
CDMA data modulation mechanisms is outside the scope of this document; however, an
overview of the ATM protocol along with a discussion on the reasons that lead 3GPP to
select it as the UMTS data link technology can be found in Appendix A.

2.2 The GPRS Network


The goal of the present thesis project is the implementation of a UMTS Gi interfacing
and measuring system. As described earlier, the Gi interface is the reference point
between the UMTS network and external public or private packet-switched networks.
The GGSN is located at the boundaries of the UMTS network, acting as a single
entry/exit point for traffic originating from/destined to the specific packet-switched
network the GGSN interfaces to. The communication takes place over the Gi reference
point, utilizing a well-defined protocol stack. As the GGSN is part of the GPRS network,
we will provide an overview of GPRS architecture, the functionality it provides and the

- 20 -
Dimitris Tsaimos Msc Thesis Royal Institute of Technology

way interoperability is achieved between legacy Packet Data Networks (PDNs) and
GPRS.

The GPRS network is part of the UMTS CN; GPRS entities constitute the UMTS CN
Packet-Switched domain. UMTS interfaces to external PDNs by means of the Gi
interface and to PS domains of other UMTS networks through the Gp interface. The Gp
interface interconnects two GPRS packet domains belonging to different operators and
requires that the operators have signed some kind of mutual agreement.

GPRS is delegated a set of logical functions within the PS domain, so as to implement


packet-switched services. The set of GPRS functions can be categorized as follows

• Network access control.

• Packet routing and transfer.

• Mobility management.

• Logical link management.

• Radio resource management.

• Network management.

Providing a detailed description of all the aforementioned functions is outside the scope
of this document. Readers interested should refer to [3]. However, as the interworking
between Packet Data Networks (PDNs) and UMTS is of increased importance for our
thesis project, we will describe those functions that are closely related to packet-
switched functionality, packet routing and transfer.

2.2.1 Packet Routing and Transfer Functions


Packet routing and transfer refer to the procedures necessary for packets to be
forwarded from the originating to the destination node, whether that is within the same or
different networks. The ordered list of nodes the packet traverses to reach its destination
is called a route, while the process of determining and using a route according to a
predefined set of rules is called routing. Recall from data packet networks theory that a
route consists of the originating node, zero or more intermediate nodes - a.k.a. relay
nodes - and the destination node.

Packet routing and transfer is further decomposed into a number of lower-level functions

• relaying,

• address translation and mapping,

• encapsulation,

• tunneling,

- 21 -
Institute for Microelectronics and
UMTS Gi Iinterfacing and Measuring System
Information Technology

• domain name address resolution,

• compression,

• ciphering and data integrity protection.

The relay function is responsible for forwarding packets received by one node to the next
node in the route. The routing process determines the next network node to which a
packet should be forwarded using the packet destination address in conjunction with the
node routing table. The routing function need not be a function different than those
already deployed across PDNs. The specifications clearly state that data transmission
between GPRS network nodes may occur across PDNs that provide their own internal
routing functionality, such as ATM, Frame Relay or Internet Protocol networks.
Furthermore, address translation and mapping are utilized by the routing process. The
address translation function converts a PDN address into a proprietary address used
within the mobile network to identify a network node and/or a mobile subscriber,
whereas the address mapping function maps a network address to another network
address of the same type.

Encapsulation incorporates addition of address and control information to a packet


header so as to enable routing within and between mobile networks, as well as removal
of the additional information in order to retrieve the original packet - a process called
decapsulation. Encapsulation and decapsulation are performed throughout the GPRS
network from the mobile station to the GGSN. The tunnelling function handles the
formation of a tunnel from the point of encapsulation to the point of decapsulation.
Tunnels are established between GSNs and between the SGSN and the Radio Network
Controller to carry data; entities encapsulate packets by adding GPRS-specific protocol
information to packet headers. Encapsulation in cooperation with tunneling offers a
transparent packet transfer mechanism. The transformation of external PDNs protocol
stacks to the proprietary protocol stack used within the mobile network takes place only
in the GGSN, allowing for seamless integration of various protocols stacks.

Domain name address resolution refers to the mapping of a network node domain
name5 to its corresponding network address and vice versa. The particular function
requires the presence of a Domain Name Server (DNS). The DNS maintains a database
for resolving domain names and network addresses. The typical situation in routing is
that a device is aware of a remote network node name and queries the DNS using the
node name to obtain its network address.

Compression and ciphering are mechanisms related to packet transfer. The


compression function reduces the amount of data needed to represent information
present in a packet without reducing the information itself, thus optimizing the use of
radio path capacity. The ciphering and integrity protection functions, aim at preserving
confidentiality and integrity of user data, by means of an encryption and an integrity
protection algorithm along with a secret key.

5
A domain name is a logical name separated by periods, which describes a network entity, i.e.
www.keletron.com.

- 22 -
Dimitris Tsaimos Msc Thesis Royal Institute of Technology

2.2.2 GPRS Support Nodes


GPRS functionality is implemented on two network nodes, the Gateway GPRS Support
Node (GGSN) and the Serving GPRS Support Node (SGSN). Both nodes are referred to
as GPRS Support Nodes (GSNs), due to the fact that they were introduced to the GSM
Phase 2+ CN to support GPRS packet-switched services. SGSN is the network element
serving mobile subscribers; it includes routing, mobility management and user data
confidentiality functionality. The SGSN keeps track of the location of mobile stations,
performs security functions and enforces subscriber access control to GPRS services.
GGSN manages interconnection of the PS domain with external PDNs, mainly
containing routing information for mobile users attached to the Packet-Switched domain.
GGSN routing information is used to encapsulate and tunnel packets to user mobile
equipment.

The GGSNs and SGSNs of a GPRS network are interconnected via an Internet Protocol
based backbone network. Additionally, there can be a backbone network interconnecting
GSNs belonging to mobile networks operating under different authorities (Figure 2-2). In
case a GGSN communicates with another mobile network over the inter-network
backbone, it is called Border Gateway and includes enhanced security functionality. The
typical case is that the authorities sign some form of Service Level Agreement which
defines the features of the interconnection, mainly Quality of Service guarantees and
billing strategies.

Communication between GSNs is realized over GPRS reference points.

• Gn, between the SGSN and the GGSN.

• Gp, between GSNs belonging to different mobile networks.

• Gi, between the GGSN and external PDNs.

As we have mentioned earlier, reference points define protocol stacks to realize


communication between network elements. Due to the fact that Gp is used for
interconnection of different mobile networks, the corresponding protocol stack may
include additional layers to provide security functionality.

The backbone interconnecting GSNs uses the Internet Protocol (IP) to provide Layer-3
functionality - with respect to the OSI reference model. User Datagram Protocol (UDP) is
the transport layer mechanism and the GPRS Tunneling Protocol (GTP) is a protocol
that enables user and signaling data tunneling between SGSNs and GGSNs (Gn
interface) and between GSNs in the inter-network backbone (Gp interface). GTP
addresses separately control and plain data transfer, defining a control plane protocol,
GTP-C and a user plane protocol, GTP-U. GTP-C is used for transferring GSN capability
information between GSN pairs, for path management and for creating, updating and
deleting GTP tunnels. GTP-U carries user data packets and signaling messages.

Each SGSN and GGSN has one or more IP addresses (IPv4 mandatory and IPv6
optionally) for inter-communication over the backbone network. Furthermore, each of the
IP addresses may also correspond to one or more logical GSN names, supported by a

- 23 -
Institute for Microelectronics and
UMTS Gi Iinterfacing and Measuring System
Information Technology

Domain Name Server. GSNs IP addresses inside a mobile network form a private
address space that is not accessible from the public Internet. Both SGSN and GGSN
contain IP or other operator-selected routing functionality and they may be
interconnected using IP routers. Finally, let us note that the SGSN and GGSN
functionalities may be combined on the same physical network node, or they may reside
on different network nodes.

Public Data Network

Gi Gi
inter-network backbone
Gp Gp

GGSN GGSN
IP-based backbone IP-based network backbone

SGSN SGSN SGSN SGSN

Figure 2-2 Intra and Inter-network backbones

2.3 GPRS Session Management and Routing


To exchange data packets with external PDNs, the mobile station must first of all register
with a SGSN of the GPRS network, a procedure called GPRS attach. During GPRS
attach, the SGSN checks if the user is authorized to access GPRS services, copies
user-related information from the HLR and assigns a Packet Temporary Mobile
Subscriber Identity to the user equipment. Following, the mobile station must apply for
an address used in the PDN it wishes to communicate with - i.e. an IP address if the
PDN is an IP-based network. The particular address is called Packet Data Protocol
(PDP) address.

2.3.1 Packet Data Protocol Address Assignment


The PDP address can be assigned by either the mobile network operator or the visited
PDN operator. There are two address allocation policies

• static,

• dynamic.

- 24 -
Dimitris Tsaimos Msc Thesis Royal Institute of Technology

In the static address allocation scheme the mobile network operator permanently
assigns a PDP address to the mobile station, whereas in dynamic address allocation the
mobile station is assigned a PDP address as soon as the PDP context activation
procedure - explained in the next paragraph - is completed. When dynamic address
assignment is utilized the actual address can be allocated by either the mobile network
operator or the visited network operator. In either case a pool of addresses is reserved
for use by the GPRS operator and all mobile subscriber addresses are retrieved from the
particular pool. It is common practice for GPRS networks to enforce dynamic address
allocation, due to the fact that it enables more users to be supported. Static addresses
are permanently committed by the user they belong to, even if the user does not require
to use packet-switched services. Dynamic addresses on the other hand are assigned on
demand and only when the user wishes to use packet-switched services, leading to
efficient address pool management.

2.3.2 Packet Data Protocol Context


The PDP context is the descriptor of a session established between the user and the
PDN. It describes characteristics of the session; specifically it contains [5]

• the PDP type address assigned to the user. Examples of address types are IPv4,
IPv6, X.25,etc,

• the PDP address allocated to the user,

• session QoS parameters

• the address of the GGSN that serves as the gateway to the PDN.

The PDP context is stored in the mobile station, the SGSN and the GGSN. In order for a
mobile station to be visible from the external PDN, a PDP context must be created and
activated. The PDP context is created in the user equipment and activated using a
handshake procedure involving the mobile station, the SGSN and the GGSN (Figure 2-
3).

Activate PDP context request


[PDP type, PDP address,
QoS parameters, GGSN address]

security functions
Create PDP context request
[PDP type, PDP address,
QoS parameters, GGSN address]
UE SGSN Create PDP context response GGSN
[PDP type, QoS parameters]
Activate PDP context accept
[PDP type, PDP address,
QoS parameters, GGSN address]

Figure 2-3 PDP context activation procedure

- 25 -
Institute for Microelectronics and
UMTS Gi Iinterfacing and Measuring System
Information Technology

The mobile station after creating the PDP context sends an “activate PDP context
request” message to SGSN. If static address assignment is used , the PDP context
contains the station address, otherwise it is left empty. The SGSN performs any
necessary security functions, i.e. user authentication, and if the user supplied the
credential expected sends a “create PDP context request” message to the appropriate
GGSN. The GGSN then creates an entry in its PDP context table and returns a “create
PDP context response” message to the SGSN, containing the PDP address in case
dynamic address assignment was requested. Finally, the SGSN updates its PDP context
table and sends a “activate PDP context accept” message to the mobile station, to
confirm the activation of the new PDP context. The PDP context table entries in the
SGSN and GGSN enable the GSNs to route data packets between the user equipment
and the external PDN.

2.3.3 Packet Routing Process


Once the PDP context has been activated, the mobile station can exchange data with
the PDN. Packets are forwarded between the user and the PDN by means of the routing
process. The mobile station sends packets destined for a host residing in the PDN - i.e.
a web server - to the SGSN it has registered with. The SGSN uses GTP-C to set up a
tunnel with the appropriate GGSN upon the beginning of the session and GTP-U to
tunnel user signaling and data packets to the appropriate GGSN. For each packet
received by the mobile station, SGSN examines the corresponding PDP context,
encapsulates the packet and sends it to the GGSN over the intra-network backbone.
The GGSN decapsulates the packet and if necessary modifies packet headers
according to the protocol stack used in the PDN. It finally sends the packet out to the
PDN, where internal routing mechanisms forward it to its destination.

In the reverse path, a host residing in the PDN wishes to send a packet to the mobile
station. The mobile station has been assigned an address retrieved from the address
pool of the GPRS operator, and therefore its address has the same prefix as the GGSN
address. The host sends packets for the particular mobile station out to the network and
the network routing mechanisms forward the packet to the appropriate GGSN, located in
the user home network6. The GGSN queries the HLR to retrieve current user location,
creates the tunnel with the SGSN which serves the user by means of GTP-C and
encapsulates packets to the SGSN using GTP-U. Packets are routed using either the
intra or inter-network backbone, depending on current user location. Finally, the SGSN
decapsulates the packets and delivers them to the mobile station.

2.4 UMTS - IP Networks Interworking


UMTS specifications have addressed interworking between UMTS and IP-based
networks [6] as well as X.25-based networks. Nevertheless, there also is the possibility
of UMTS interworking with networks utilizing proprietary protocols. In the case of
interworking with IP, the UMTS specification dictates that the OSI layer 3 used by the
GGSN on the Gi interface should be the Internet Protocol, specifying that layers one and
two are specific to the ones used in the PDN.

6
The mobile subscriber home network is the network operated by the authority with which the user has
signed a service level agreement.

- 26 -
Dimitris Tsaimos Msc Thesis Royal Institute of Technology

2.4.1 Gi Protocol Stack


Computer networking has for some time now converged to the use of Ethernet as the
data link layer technology and User Datagram Protocol or Transmission Control Protocol
as the transport layer technology. This situation results in the typical Gi protocol stack
deploying Ethernet and UDP (Figure 2-4).

Gi
Public Data
Network
GGSN
User IP User IP
PPP PPP
L2TP L2TP
TCP/UDP TCP/UDP
IP IP
Ethernet Ethernet
physical physical

Figure 2-4 Typical Gi interface protocol stack

It is common practice for mobile equipment to transfer IP datagrams encapsulated in


Point-to-Point Protocol (PPP) frames, due to PPP simple data transfer mechanism. PPP
is designed for simple links which transfer packets between two peers. The links provide
full-duplex simultaneous bi-directional operation and are assumed to deliver packets in
order. PPP frames can be carried over packet-switched networks using the Layer 2
Tunneling Protocol (L2TP). L2TP forms a tunnel between the GGSN and the PDN
operator tunnel endpoint to transfer PPP frames over the IP-based network, as if there
actually were point-to-point links. In the following subsection a brief description of PPP
and L2TP is provided; UDP functionality is considered well-known due to wide
deployment of UDP/IP networks, whereas Ethernet functionality will be presented in
conjunction with the chip that implements it on the test tool board in chapter 4.

2.4.2 Point-to-Point Protocol


The scope of PPP [7] was to achieve a common solution for easy connection of a wide
variety of hosts, bridges and routers. PPP architecture consists of three parts

• a method for encapsulating network layer protocol datagrams,

• a Link Control Protocol (LCP) for establishing, configuring and testing the data-
link connection,

- 27 -
Institute for Microelectronics and
UMTS Gi Iinterfacing and Measuring System
Information Technology

• a family of Network Control Protocols (NCPs) for establishing and configuring


different network layer protocols.

PPP encapsulation supports simultaneous multiplexing of different network layer


protocols over the same link. LCP is delegated control and management operations of
PPP sessions. It is used to automatically agree upon session parameters, like
encapsulation format options and handling varying limits on packet sizes, as well as to
manage the session by detecting looped-back links, identifying other common
misconfiguration errors and terminating the link. Additionally, LCP includes optional
facilities for identity authentication of one’s peer on a particular link and determination of
link functionality; i.e. if the link is functioning properly or failing.

NCPs provide a method for establishing and configuring different network layer
protocols. They constitute a family of control protocols, each one managing the specific
needs of the respective network layer protocol it was designed for. In the case of IP-
based networks, the corresponding NCP is called Internet Protocol Control Protocol
(IPCP). Examples of IP parameters configured via IPCP are the use of a specific
compression mechanism for the TCP/IP header and IP address assignment.

2.4.3 Layer 2 Tunneling Protocol


L2TP [8] provides a dynamic mechanism for tunneling layer 2 “circuits”, either virtual or
physical, across a packet-oriented network; it was originally defined as a standard
method for tunneling PPP sessions. L2TP consists of

• the control protocol for dynamic creation, maintenance and teardown of L2TP
sessions,

• the data encapsulation process to multiplex and demultiplex layer 2 data streams
between two L2TP nodes across an IP-based network.

Consequently, L2TP defines control and data messages. Control messages are used to
establish, maintain and clear control connections and L2TP sessions, utilizing reliable
message transfer mechanisms to guarantee delivery - i.e. retransmission of lost
messages. Data messages are used to encapsulate layer 2 traffic carried over the L2TP
session. The data messages transfer mechanism is unreliable and therefore does not
guarantee delivery.

2.4.4 Packet Transfer Procedure


To glue all pieces together, we will describe the route of a packet from the mobile station
to the PDN, when the Gi interface is built on a protocol stack utilizing IP, UDP, L2TP and
PPP. The mobile station must first of all have activated a PDP context to be visible from
the PDN. Prior to any data transfers taking place, the signaling necessary to establish
and configure tunnels and PPP session parameters must be performed. There should be
a GTP tunnel between the RNC and the SGSN and another one between the SGSN and
the GGSN, formed using GTP-C. Furthermore, the GGSN must have established and
configured the L2TP tunnel between itself and the tunnel endpoint on the PDN side.
Finally, PPP signaling procedures must have been completed using LCP and IPCP, in

- 28 -
Dimitris Tsaimos Msc Thesis Royal Institute of Technology

order for PPP session and IP-specific parameters to be agreed upon between the GPRS
and the PDN network. The mobile station is able to send and receive packets as soon as
the signaling process is executed successfully.

An application running on the mobile station, i.e. a web browser, prepares a packet and
delivers it to the transport layer, which encapsulates it in a UDP packet. The UDP packet
is then enclosed in an IP packet, which in turn is encapsulated in a PPP frame. The PPP
frame is transferred over the air interface to the RNC. The RNC then encapsulates the
PPP frame in a GTP packet and forwards it to the SGSN serving the particular user. The
SGSN routes the packet to the GGSN interfacing to the PDN the packet recipient
resides. The GGSN is responsible for decapsulating the frame and re-encapsulating in a
L2TP packet so as to traverse the IP-based PDN. Additionally, the L2TP packet is
enclosed in a new IP packet so as to traverse the packet-switched network and reach
the L2TP tunnel endpoint. The new IP packet destination address is the tunnel endpoint
IP address. Usually, the tunnel endpoint is a router behind which the destination host
resides. Finally, the L2TP tunnel endpoint retrieves the original IP packet of the mobile
station and forwards it to the host it is destined for (Figure 2-5).

mobile station
user IP user IP
packet
PPP PPP

GTP L2TP

UDP UDP
transport network
trans IP trans IP
encapsulation
Ethernet Ethernet
user IP
physical physical
Layer 2

Public Data
GTP-U L2TP

Network
SGSN GGSN Router server
Gn Gi
Figure 2-5 Packet transfer and related protocol stacks
Depending on the data link technology used in the PDN, the tunnel endpoint may
transmit the PPP frame as is or enclose it in the corresponding data link-specific frame,
i.e. Ethernet. The procedure for routing packets from the PDN host to the mobile station
is analogous.

An important point in the procedure described in the previous subsection is the use of
two IP planes

• the transport plane,

• the user IP plane.

The transport plane refers to the IP layer of the backbone network used to transfer
packets from the mobile station to the GGSN, whereas the user IP plane refers to the IP
packet carrying data generated by mobile station applications.

- 29 -
Institute for Microelectronics and
UMTS Gi Iinterfacing and Measuring System
Information Technology

2.5 UMTS Gi Interfacing and Measuring System


The Gi interfacing and measuring system implemented during our thesis project
assumed the commonly configured protocol stack of the Gi interface, as it is depicted in
Figure 2-4. The system simulates the functionality of an IP-based Public Data Network
towards a GPRS Gateway Support Node (GGSN). In order to deliver such functionality it
imitates the behavior of a PDN router to the GGSN, by generating traffic and gathering
statistics related to specific protocol layers and/or “virtual” users supposed to be
generating the actual traffic.

2.6 Additional IP-related Functionality


Having presented the protocol stack upon which our implementation will be based, we
will conclude chapter 2 with a brief description of complementary IP-related functionality.
This section is about functions 3GPP has not dictated, but has provisioned for.

The PS domain supports additional functionality related to IP-based networks, the most
important of which being the Dynamic Host Configuration Protocol (DHCP), the Domain
Name Server service, Internet Protocol Security (IPSec), IP multicast and Authorization,
Authentication and Accounting. DHCP provides a means for dynamic allocation of IP
addresses as well as a mechanism for passing additional configuration parameters to
hosts connected to a TCP/IP network. DHCP support is accomplished by a packet-
switched domain specific DHCP relay agent, which allows correct routing of DHCP
requests and replies between the mobile station and the DHCP servers. The DHCP
relay agent is located in the GGSN. Moreover, the GPRS or the external network can
maintain a Domain Name Server to map external PDN host domain names to IP
addresses.

The GGSN can optionally run the Internet Protocol Security protocol on the Gi interface
in order to provide data confidentiality and integrity. IPSec is positioned directly above
layer 3 with respect to the OSI reference model, requiring the services of the Internet
Protocol. Alternative security protocols may be used on the basis of a mutual agreement
between network operators. Similarly, for inter-network routing information exchange the
Border Gateway Protocol (BGP) is suggested, but the use of alternative protocols may
be agreed upon.

There also is provisioning for IP multicast traffic support. The multicast support imposes
further requirements on the GGSN. An IP-multicast proxy must be integrated into the
GGSN; furthermore, the GGSN must support the Internet Group Management Protocol
(IGMP) and one or more Inter-Router Multicast protocols, such as the Distance Vector
Multicast Routing Protocol, Multicast Open Shortest Path First (an extension to Open
Shortest Path First to support multicast), or Protocol Independent Multicast - Sparse
Mode. The IGMP protocol enables endpoints to inform their adjacent routers which
network layer multicast address they wish to receive, whereas the Internet-Router
Multicast protocols enable routers to exchange routing information and form their
multicast routing tables. From the packet domain point of view, multicast traffic is
handled at the application level and sent over UDP/IP.

- 30 -
Dimitris Tsaimos Msc Thesis Royal Institute of Technology

Finally, an important issue for service providers and network operators is network and
service access control and accounting. Remote Authentication Dial-In User Service
(RADIUS) can be optionally used by the GGSN in order to authenticate users and
provide accounting information. In that case, both RADIUS authentication and
accounting client functions reside in a GGSN. The client functions send the relevant
information to the corresponding servers. The authentication server decides if the mobile
user is authorized to access the network and/or the service, and if so grants access and
optionally provides network information (i.e. IP address). The accounting server
maintains information sent by the corresponding client function, which it uses to identify
and bill the mobile user.

3. Test Tool Functional Principles

- 31 -
Institute for Microelectronics and
UMTS Gi Iinterfacing and Measuring System
Information Technology

3.1 Keletron Test Tool Functionality


Keletron has designed and implemented a mobile networks test tool targeted to UMTS
and GPRS networks. The test tool provides the means for network diagnostics and
planning as well as performance and conformance testing of GPRS and UMTS Core
Network equipment. It consists of four distinct systems, each one providing the ability to
evaluate a particular interface implementation

• Iu-Packet Switched.

• Iu-Circuit Switched.

• Gn-SGSN.

• Gi-PDN.

The principle test tool systems are built on is the simulation of network elements external
behavior towards the device under test. Specifically, the Iu-PS system simulates several
RNCs over the Iu-PS interface towards the Core Network, whereas the Iu-CS system
offers the same functionality for the Iu-CS interface. The Gn-SGSN system is used for
evaluating Gn interface implementations towards the GGSN under test. Finally, the Gi-
PDN system simulates the behavior of the Public Data Network over the Gi interface in
order to examine a GGSN behavior.

The actual simulation of networking equipment behavior is realized through the


simulation/emulation of the protocol stack over which the equipment communicates with
the device tested. The test tool generates traffic towards the device by means of data
and/or signaling packets, whose structure is dictated by the protocol stack of the
particular interface. In essence, the test tool emulates a “cloud of subscribers” and the
corresponding traffic they would generate, were they present in the network. The
generated traffic acts upon the Core Network and in turn generates “responses” from the
elements that comprise it. The “responses” are collected and analyzed by the test tool in
order to collect statistics related to the mobile network and/or the equipment behavior.

3.2 Building Components


The test tool application is realized by means of its software and hardware components.
The application software runs on the host processor of the Single Board Computer. Its
tasks include configuration and initialization of the test tool. The operator is able to
assign values to a number of configuration parameters of the test tool using a Graphical
User Interface (GUI), thus creating customized test scenarios. Test scenarios consist of
data and/or signaling packets as well as data structures related to “virtual” users. The
application software is responsible for constructing the particular data structures and
packet headers and downloading them to the RAMs present in the test tool cPCI board.

As soon as the configuration and downloading process is completed, the test tool is
ready to begin its operation. This is where the hardware comes into play. Test tool
hardware consists of the cPCI master card and the rear panel boards - a detailed
description of the cPCI standard and system architecture can be found in Appendix C.

- 32 -
Dimitris Tsaimos Msc Thesis Royal Institute of Technology

The main card assembles data and/or signaling packets using the data structures and
packet headers constructed by the application software. Packet headers are
complemented with “dummy” payload. Statistics collected are based on packet headers
information; as a result of that adding meaningful packet payload would be of no use.
Finally, the packets assembled are handed over to the rear panel board for transmission.
The actual rear panel board used depends on the interface data link layer protocol.

3.3 Configuration Process


As we mentioned earlier the test tool simulates the presence of subscribers to the
network by generating “virtual user”-originated traffic. Within the test tool context, traffic
generation is translated into the process of creating and transmitting packets to the
network under test for each user that is supposed to be active during simulation. The
aforementioned point raises a number of issues regarding the way traffic should be
generated

• how should new “virtual” users be added to the simulation,

• what kind of packets must the test tool create for each “virtual” user,

• how often should packets be sent for a “virtual” user.

To cope with these issues, the test tool incorporates a number of configuration
parameters related to users that become active during simulation. The configuration
parameters are selected and entered by the test tool operator through a Graphical User
Interface. The main product of the configuration process is a number of Traffic Profiles
(TPs) and User Profiles (UPs) data structures. TPs and UPs are downloaded to the test
tool RAMs, along with the data and/or signaling packet headers constructed by the
application software.

3.3.1 Traffic and User Profiles


TPs and UPs are the key enablers of the test tool hardware functionality, allowing the
hardware to address traffic generation considerations mentioned previously. Each
“virtual” user is associated with a TP and a UP. There is a one-to-many relationship
between a UP, a TP and users. Practically speaking, one UP and one TP can be
simultaneously deployed by many users. A TP accurately describes the traffic
characteristics of a user flow7. TPs are configured in terms of packet size and data rate;
the operator can define a data rate and a packet payload length value. The data rate
refers to the amount of bandwidth a user utilizing the specific TP should consume and
the payload length refers to the total length user packets should have. The division of the
packet size n (bits) with the data rate d(bits/sec) gives us a timer duration t(sec).

t = n/d

For example a packet length equal to 80 bytes (640 bits) and a user data rate equal to
64Kbits/sec yields a timer duration of 10 milliseconds. This means that a user utilizing

7
A user flow is an aggregate of users having the same traffic characteristics.

- 33 -
Institute for Microelectronics and
UMTS Gi Iinterfacing and Measuring System
Information Technology

the TP with the particular characteristics should generate a 80-byte packet every 10
milliseconds. In general, whenever a TP - having packet size n and timer duration t = n/d
- timer duration elapses, the hardware must generate one n-bit packet for each user
incorporating the specific TP.

The UP on the other hand defines the way a user enters simulation, as well as the kind
of datagrams that must be created for the specific user. The operator can decide to
activate users in steps, in which case successive users will be entering the simulation
with a time interval calculated as a random value between minimum and maximum
values defined by the operator. The alternative is to activate users at a specified rate, i.e.
15 users per second. User activation can be delayed by a random time interval
calculated by the test tool, based on minimum and maximum delay values supplied by
the operator. Last but not least, UP configuration enables the operator to associate a UP
with a specific TP. UPs are combined with TPs to produce transmit Traffic Events (TEs).
In essence, a transmit TE leads to the generation and transmission of a specific packet,
whose header type is defined in the user UP and the corresponding packet payload
length is defined in the user TP. Such a packet should be transmitted according to the
data rate supplied by the operator in the user TP.

3.3.2 Configuration Files


The configuration process depends on a set of parameters supplied by the test tool
operator, by means of the application software GUI. However, in certain cases the test
tool needs additional input which is provided by the operator in the form of configuration
files.
GUI configuration
configuration files

configuration configuration
parameters parameters

application
software
creates user packets, traffic
profiles and downloads
them to board RAM
user, traffic profiles, user and protocol
packet headers statistics
Generator assembles packets
and hands them over to rear
transition board for transmission
master board Analyzer fetches packet from rear
transition board, performs parsing
and statistics gathering
“virtual user”
packets received
packets
performs layer 1 and 2
functions
rear transition
board

Figure 3-1 Test tool application configuration sequence and operation

In general, configuration files are used to specify parameters not included in the GUI,
such as packet header fields. Their actual content depends on the test tool system used.

- 34 -
Dimitris Tsaimos Msc Thesis Royal Institute of Technology

The complete configuration sequence along with a high-level view of the test tool
operation is depicted in (Figure 3-1).

3.3.3 Test Scenarios Creation


A test scenario is a complete test case, defining all aspects of the simulation process. A
test scenario specifies

• the number of “virtual” users that will be activated during simulation,

• the kind of traffic they will generate,

• the number and characteristics of user flows,

• the time interval during which the simulation will take place.

The kind of traffic a user generates is information contained in the user User Profile,
whereas the characteristics of user flows are defined in Traffic Profiles. The number of
user flows is analogous to the number of TPs. A user becomes a member of a user flow
by incorporating the TP describing the user flow. The TP in conjunction with the UP
indicate to the hardware the type of packet that should be generated for a particular
user, as well as the packet generation frequency. In order to complete the test scenario
definition, the operator is further called to supply the number of users that will participate
in the simulation as well as the total duration of the simulation process. Both of these
parameters are configured via the GUI. The operator after creating TPs and UPs selects
the actual UPs that will be used in the simulation and the number of users that will utilize
a particular UP. Finally, the total test scenario duration is indirectly defined by supplying
the time interval for which each UP, and consequently all users utilizing the specific UP,
will be active.

Finally, we should mention that the test tool provides an extra level of flexibility, allowing
the operator to save TPs, UPs and scenarios created. This approach enables the
operator on the one hand to re-execute the same test scenarios and on the other hand
to re-use existing TPs and UPs, creating new scenarios with minimum effort.

3.4 Simulation Time and Ticks


A tick is the basic time unit of the simulation and it is a multiplicand of the actual digital
clock period that the system is using. It is equal to the system clock cycle duration,
during which all events related to timers expired must be triggered. During simulation,
time is measured in ticks and as a result of that, user timer durations are defined in ticks.
The test tool architecture imposes an upper bound on the number of events that can be
triggered within a system clock cycle and consequently a lower bound on the tick value.
Specifically, the minimum value a tick may have is 300 times the system’s digital clock.
For example, if the digital clock of the test tool is running at 50MHz - which yields a clock
cycle equal to 20 ns - then the system clock should be running at no more than 167KHz
- which equals to a tick value of 6 µs.

- 35 -
Institute for Microelectronics and
UMTS Gi Iinterfacing and Measuring System
Information Technology

3.5 Test Tool Hardware Architecture


The test tool is implemented as a 6U cPCI master card, accompanied by interface-
specific rear transition modules. The 6U card interfaces to the Single Board Computer by
means of the system-wide cPCI bus and is responsible for “constructing” data and
signaling packets, which it further hands over to the rear transition modules. Additionally,
it parses packets received by the rear transition modules and gathers statistics based on
their header information. Statistics are transferred to the Single Board Computer host
processor every one second, using an interrupt-driven architecture. The test tool
microprocessor notifies the host every one second using an interrupt signal; the host
then collects statistics from the board memories over the cPCI bus and finally statistics
are displayed on the monitor by the application software. Test tool memory and
hardware device configuration registers are memory-mapped to the host processor
address space; all hardware and software configuration as well as statistics data is
transferred to/from the test tool board devices as if they were memory locations.

The 6U card is mounted on the cPCI connectors in the front side of the backplane. Rear
transition modules on the other hand are mounted on the rear of the backplane, as their
name implies. They are the modules managing data link layer specific functions and
transmitting packets assembled by the main board. Therefore, rear transition cards are
specific to the data link layer technology of the particular interface simulated.

3.5.1 Master Board


The 6U master card incorporates the following building blocks (Figure 3-3)

• a microprocessor,

• a number of Random Access Memories (RAMs),

• a Content Addressable Memory (CAM),

• two Field Programmable Gate Arrays (FPGAs),

• a PCI bridge.

The microprocessor is delegated configuration and control tasks. It is the interface of the
test tool towards the host processor. RAMs along with the CAM constitute the system
volatile memory. RAMs are used for configuration and statistics data storage; the CAM is
currently not utilized, but is intended to be incorporated in a traffic classification scheme.
RAMs store packet headers, user and protocol-level statistics; additionally they may
store user Traffic and User Profiles. FPGAs add flexibility to the test tool hardware
architecture, allowing it to be highly customizable and configurable.

The need to emulate various interfaces of the UMTS network requires the capability to
accommodate the range of protocols and data link layer mechanisms utilized by any
particular interface. FPGAs contain the hardware logic required to assemble data and
signaling packets along with glue logic to interface to the specific data link layer
technology used for the interface. All of the interfacing and measuring systems

- 36 -
Dimitris Tsaimos Msc Thesis Royal Institute of Technology

mentioned in the previous subsection are implemented on the two FPGAs. For the
purposes of the present thesis project the FPGAs were used to implement the Generator
and Analyzer modules. The Generator is responsible for assembling user packets and
delivering them to the rear panel card for transmission, whereas the Analyzer is
responsible for fetching packets received by the rear panel card, parsing them and
generating statistics based on information contained in their header fields.

Last but not least, the PCI bridge handles cPCI bus accesses and data transfers.
Specifically, whenever the test tool needs to use the system-wide cPCI bus, the PCI
bridge arbitrates for the bus on behalf of the test tool and performs the actual data
transfer. It interfaces on the one hand to the test tool local bus and on the other hand to
the cPCI bus. Additionally, upon system startup the host processor downloads
configuration data to the test tool hardware utilizing the PCI bridge.

3.5.2 Rear Transition Modules


Rear transition modules are mounted on the backplane rear panel connectors. The
backplane supports rear-panel I/O through double-headed connectors, mirroring front
board signals to the rear of the backplane. The rear transition modules are delegated the
data link and physical layer specific functions. Packets assembled by the main board are
handed over to the rear panel card for transmission.

A number of different modules is required in order for the test tool to simulate various
interfaces protocol stacks. The Iu-PS and Iu-CS systems generate traffic over the Iu-PS
and Iu-CS interfaces, whereas the Gn-GGSN and Gi-PDN systems create data and
signaling packets over the Gn and Gi interfaces correspondingly. In chapter 2 we
described the Iu interfaces and the Asynchronous Transfer Mode protocol they deploy
as their data transport mechanism. The Gn and Gp interfaces on the other hand build
their services on top of Ethernet.

It is evident from the above that the master card should be accompanied by the rear
panel board which performs the interface-specific layer one and two functions.
Therefore, there are two rear transition cards

• Ethernet-based

• ATM-based

The Ethernet card is the one of importance to the thesis project, since the scope of the
project is to design and implement the Gi interfacing and measuring system. The
Ethernet board has four 10/100Mbit physical interfaces and can therefore accommodate
up to 400Mbit/sec of aggregate “virtual” user traffic data rates. The interface of the
master card to the Ethernet board is implemented in the Xilinx FPGAs. Both the
Generator and the Analyzer need to exchange data with the Ethernet board; the
Generator delivers packets assembled for transmission, whereas the Analyzer retrieves
packets received in order to perform the related parsing and statistics gathering.
Specifically, the Analyzer and Generator modules communicate with the device
performing Ethernet MAC layer processing on the card. The following sections shall
briefly describe the Ethernet standard, concluding the chapter with an overview of the

- 37 -
Institute for Microelectronics and
UMTS Gi Iinterfacing and Measuring System
Information Technology

actual chip used in the Ethernet rear transition board, so as to give an intuition of the
functionality it provides and the way it communicates with the FPGA modules.

3.6 Ethernet
The term Ethernet refers to the family of Local Area Network products conforming to the
IEEE802.3 standard that defines what is commonly known as the CSMA/CD protocol, as
well as a number of physical link types. The CSMA/CD protocol regulates access of
network nodes to the physical link used for data transmission, whereas physical link
types define the data transmission rate, the signal encoding, and the type of media
interconnecting the network nodes. For example, Gigabit Ethernet is defined to operate
over either twisted-pair or optical fiber cable. Each kind of cable or signal-encoding
procedure requires a different physical link layer implementation resulting in a different
physical layer type.

3.6.1 IEEE802 Data Link Sublayers


In the IEEE802.3 standard - and in all IEEE802 family standards - the ISO data link layer
is divided into two IEEE802 sublayers

• The Media Access Control (MAC) sublayer

• The MAC client sublayer

The IEEE802.3 physical layers correspond to the ISO physical layer, and are specific to
the physical link type interconnecting network nodes.

Of particular interest to this project is the MAC sublayer, as both the Generator and the
Analyzer need to interface to a chip that implements MAC sublayer functionality. The
primary responsibilities of the MAC sublayer are

• to encapsulate data provided by the MAC sublayer client into Ethernet frames,

• to parse and check for errors Ethernet frames upon their reception,

• to perform media access control, including initiation of frame transmission and


recovery from failure.

The IEE802.3 standard requires a basic data format for all MAC implementations. The
format includes a number of standard fields that should be implemented, as well as
optional extensions [9]. An Ethernet frame with all standard fields is depicted in (Figure
3-2). The preamble field synchronizes frame reception with the incoming bit stream. It is
a 7-byte field, consisting of alternating ones and zeros indicating that a frame follows.
The Start Frame Delimiter (SFD) is a 1-byte field that also contains an alternate number
of ones and zeros, with the last two bits being ones. It indicates to the receiving station
that the next 6 bytes it will receive will be the frame destination address. The frame
Destination Address (DA) identifies the station/stations that should receive the frame. It
contains either a specific station address or the broadcast address or a multicast
address. The Source Address (SA) identifies the station that sent the frame.

- 38 -
Dimitris Tsaimos Msc Thesis Royal Institute of Technology

The Length/Type field is a 2-byte quantity, which can be interpreted in two ways
depending on its value. If it is less than 1500, it indicates the number of MAC-client data
bytes contained in the data portion of the frame; if it is more than 1500, it indicates an
optional type of frame. The frame data field consists of 46 to 1500 informational bytes.

frame payload

PRE SFD DA SA Length/Type higher layer data Pad FCS

Figure 3-2 Ethernet frame structure

Finally, the Frame Check Sequence is a a 32-bit Cyclic Redundancy Check (CRC) that
is created by the sending station and recalculated by the receiving station to check for
damaged frames. The CRC is computed as a function of the source and destination
address fields, the type/length field and the data field. The specification also defines a
minimum and a maximum size for the Ethernet frame, referring to total frame length
excluding preamble and start of frame fields. An Ethernet frame minimum size is 46
bytes, whereas its maximum size is 1500 bytes.

3.6.2 Medium Independent Interface


Furthermore, there is a standardized interface between the MAC sublayer and the
physical layer - not part of the IEEE802.3 specification - to provide media independence
by separating MAC controllers and transceivers both functionally and physically. MAC
controllers provide MAC sublayer functionality and interface with higher protocol layers.
On the other hand, transceivers are specific to each media type and include functions
such as signal encoding facilitating the actual data transmission. The aforementioned
interface is named Medium Independent Interface (MII) and defines a set of signals that
the MAC controller and the physical layer chip must utilize to synchronize frame
transmission and reception and to perform medium access control [10].

In general, the MII consists of two 4-bit wide data buses - for sending/receiving data
to/from the transceiver – and a control interface containing signals that indicate to the
controller if the medium is idle, whether a collision has occurred during frame
transmission and finally if a physical layer error has occurred. Additionally, there are
signals necessary to synchronize frame transfers between the controller and the
transceiver.

3.7 IXF440 Multiport 10/100Mbps Ethernet Controller


The IXF440 MAC controller [11] resides on the Ethernet rear transition board,
performing all MAC layer processing. Both the Generator and the Analyzer interface to
the IXF440 in order to send and receive packets from the Ethernet-based network. The
IXF440 includes a number of features and configuration parameters, amongst which we
will only refer to the ones used to design the Generator and Analyzer data exchange
path with the chip.

- 39 -
Institute for Microelectronics and
UMTS Gi Iinterfacing and Measuring System
Information Technology

The IXF440 chip handles Ethernet MAC layer processing. On the one side it interfaces
to the Generator to receive packets assembled for transmission and the Analyzer to
deliver packets received for further processing and statistics gathering. On the other side
it interfaces to Ethernet quad transceivers that manage Ethernet physical layer
functionality through MII standard PHYs . The IXFF440 includes 8 independent 10/100
Mbps Ethernet MACs, and maintains Simple Network Management Protocol (SNMP)
and Remote Monitoring (RMON) statistics counters. Finally, there is a CPU interface
used to program chip configuration registers and collect statistics from the chip counter
sets (Figure 3-3). The CPU interface is generic and supports a wide range of standard
controllers. Each of the 8 IXF440 MACs has its own independent registers, and each
register is accessible through the data bus and address bus of the CPU interface. A
specific port is addressed using port select signals, whereas each port has a dedicated
interrupt signal to report special events to the CPU.

The Ethernet MAC sublayer-related processing the IXF440 chip performs includes

• medium access control

• IEEE802.3 frame assembly and transmission

• CRC checking on the receive side and calculation on the transmit side

During frame assembly the 56 preamble bits and the SFD are appended to the head of
the packet, whereas the 32-bit CRC is calculated and appended to the tail of the packet.
The frame supplied to the IXF440 chip is stored in one of the transmit FIFOs according
to the physical port through which it will be sent to the network. The frame must already
contain the source and destination address fields, the type/length fields and the data
field. If the final frame length is less than 46 bytes, it is padded to 46 bytes with zeros.
The IXF440 initiates frame transmission and handles possible collisions by rescheduling
frame transmission based on the exponential back-off algorithm of the CSMA/CD
protocol. Medium access control is realized via the MII control signals between the
controller and the transceiver.

Packets received from the network are stored in one of the 8 receive FIFOs, depending
on the physical port they were actually received. The MAC controller after detecting the
preamble, checks for the SFD byte. If the frame delimiter is invalid, current frame
reception is aborted and the controller waits until the network activity starts before
monitoring the line for a new preamble sequence. In case the SFD byte is valid, the
IXF440 receives the entire frame considering the last 4 bytes to be the CRC. It then
checks the CRC and reports all errors.

Each MAC incorporates two 256-byte independent FIFOs for packet transmission and
reception. FIFOs buffer packets coming from the Generator as well as frames received
from the network and destined for the Analyzer. All data is transferred between the
FPGA modules and the IXF440 chip onto a common FIFO interface, the IXBus. The
IXBus is a 64-bit wide data bus, whose operation can be configured in 3 different modes

• split.

- 40 -
Dimitris Tsaimos Msc Thesis Royal Institute of Technology

Split mode supports a 32-bit dual unidirectional bus, where one bus is dedicated to
packet reception from “MAC client” hardware and the other to packet transmission to
“MAC client” hardware,

• narrow.
In this mode, the IX Bus acts as a 32-bit bidirectional bus, not overlapping packet
transmission and reception,

• full.
Full mode supports a 64-bit bidirectional bus.

The IXF440 controller also incorporates control signaling related to the IXBus interface.
Specifically, there are byte enable signals, one for each of the 8 bytes the IXBus can
transfer. During a transfer, the byte enable signals indicate which bytes should be
considered valid by the receiver. Furthermore, a 32-bit status word containing frame
description information follows every transfer from the IXF440 device to a MAC client - in
our case the Analyzer. Furthermore, the IXF440 chip has two control signals indicating
the start and the end of a frame transfer correspondingly. When IXF440 FIFOs are the
target of the transfer, the signals must be driven by the transaction initiator, whereas
whenever the IXF440 initiates a transfer, it drives the signals itself. The frame status
word allows the Analyzer to maintain certain types of statistics regarding frames
received. Additionally, the controller can be programmed as to the transfer of the frame
CRC as well as to the transfer of frames containing CRC errors; such kind of frames
may not be evicted from the FIFOs but treated as regular data instead. Nevertheless, the
CRC error event is reported in the corresponding status word. Finally, the controller can
be instructed to transmit a frame as it was received from the MAC client - the Generator
- without performing any processing. The particular feature enables the test tool to
transmit packets containing errors in order to examine the behavior of the element tested
under abnormal conditions.

As mentioned earlier, different FIFOs are accessed according to port selection signals
and transmit - receive enabling signals. Each transmit FIFO has a transmit ready signal
indicating that there is enough free space to load new data. Each receive FIFO has a
receive ready signal to indicate it holds enough data to perform data transfers on the
IXBus. Transmit and receive enabling signals are asserted according to specific
threshold values the IXF440 is programmed with. A transmit ready signal is only
asserted if the free space in the FIFO is larger than a predefined threshold and a receive
ready signal is only asserted if there is more data in the FIFO than a predefined
threshold. Prior to the assertion of the FIFO status signal, the hardware module
interfacing to the IXF440 chip must request to send or receive data through the
corresponding transmit and receive control IXF440 input signals, as well as set the port
select signals to the value of the FIFO it wants to exchange data with.

Within the test tool system the IXBus is configured to operate in split mode. The split
mode configuration achieves the highest throughput for our design, as it ensures
independent data paths for the Generator and the Analyzer. Each module has exclusive
access to its own data bus, avoiding the use of the FIFO interface as a shared resource,
which would further require bus access synchronization mechanisms - i.e. bus
arbitration.

- 41 -
Institute for Microelectronics and
UMTS Gi Iinterfacing and Measuring System
Information Technology

rear transition
board

connectors

transceiver

MII

port 0 . . . . . . . port 7

transmit
transmit

receive
receive

FIFO

FIFO
FIFO

FIFO
MAC controller

interrupt
CPU i/f
control
control signals
signals IX bus (split mode)

Microprocessor Generator Analyzer CAM

local bus
PCI
bridge

cPCI bus i/f RAMs

master board

Figure 3-3 Test tool Gi system block diagram

- 42 -
Dimitris Tsaimos Msc Thesis Royal Institute of Technology

4. Hardware Timer Management


4.1 Timer Management
Timer management is the key enabler of the Generator functionality within the Gi
interfacing and measuring system. As we mentioned in previous sections, the Generator
emulates the presence of users in a network by generating packets supposed to be
originating from subscribers. Packet generation is based on the expiration of a timer
associated with a particular “virtual” subscriber. Timers are maintained for all users
assumed to be present in the network - referred to as active users from now on.
Whenever an active user timer expires, a packet corresponding to the specific user has
to be assembled and transmitted to the network under test. Therefore a hardware
module implementing a timer manager is required to handle timers throughout the
simulation process. The test tool supports up to 4096 active users; consequently the
hardware module designed should be able to efficiently manage up to 4096 timers.

Recall from chapter 4 that the time base of simulation is the tick. All timing-related
parameters of the test tool are measured in ticks, including active users timer durations.
The timer manager should decrement each active user timer every tick. The
aforementioned process may seem quite simplistic and straightforward; however in the
case where large numbers of timers must be decremented, a considerable processing
load is created and timer management complexity becomes an important consideration
of the hardware design. Furthermore, the timer manager must check the whole timer list
to locate expired timers and trigger events leading to packet generation.

Timer management is not a problem present only in Keletron test tool. In their vast
majority, protocol implementations use timers thus making them an inherent part of
networking equipment. Timer use in protocol implementations includes part, or all of the
following protocol processing functions

• packet loss detection.


A retransmission timer is used to measure the time window within which an
acknowledgement for a packet sent has to be received. If the timer expires, the
packet is considered to be lost and a retransmission event is triggered,

• connection management.
Most methods for setting up and tearing down connections rely on timers.

Additionally, we should mention that all modern operating systems Application


Programming Interfaces (APIs) include timer manipulation functions. The above are an
indication of the role that timer managers play in a networking system. An effective timer
management scheme is essential in the case where hundreds or even thousands of
timers are utilized. Therefore, our thesis project addressed the design and
implementation of the Generator timer manager separately, studying existing timer
manager hardware architectures and tailoring them to the particular needs of the
Keletron test tool.

- 43 -
Institute for Microelectronics and
UMTS Gi Iinterfacing and Measuring System
Information Technology

4.2 Theoretical Background


The timer manager is the module responsible for starting and stopping timers, increasing
or decreasing their value and finally triggering events related to their expiration. The
state transition diagram for a timer is depicted in (Figure 4-1).

StartTimer

Timer
Idle StopTimer Running Expiration
Expired

Tick, StopTimer
Figure 4-1 Timer state transition diagram

According to (Figure 4-1), a timer can be in one of the following states

• idle,

• running,

• expired,

When idle the timer need not be managed by the timer manager; it is either stopped or
expired. When running the timer has been started but has not expired; the timer
manager is responsible for manipulating it. Finally, a timer reaches the expired state
when its duration has elapsed, but the relevant event has not been triggered yet. An
expired timer is the result of implementation restrictions, sourcing from the fact that an
implementation cannot check all running timers for expiration and generate timeout
events infinitely quickly.

Transitions between timer states are based on three basic functions. Specifically,
StartTimer starts a timer in order to become running. In case the timer was already
running, it is restarted. StopTimer stops a timer to prevent it from generating any further
timeout events. Tick checks running timers for expiration and generates timeout events
whenever they expire. StartTimer and StopTimer are invoked by an entity utilizing
timers, i.e. a protocol, whereas Tick is invoked at regular time intervals by the timer
manager.

4.2 Timer Management Data Structures


The issue of timer management has been commonly resolved using the “sorted list”
approach, where timers are organized in a list having expired timers always placed on its

- 44 -
Dimitris Tsaimos Msc Thesis Royal Institute of Technology

top. In essence, the issue is deduced to the well-known problem of sorting, for which
several software data structures and algorithms have been proposed and implemented.
This section briefly describes common data structures used in sorting algorithms through
the prism of timer management.

4.2.1 Unordered Data Structures


Schemes utilizing this approach randomly store timers in main memory, thus creating a
timer list that has no hierarchy and/or organization. As a result of that, the timer manager
has to inspect all timers simultaneously to examine if they have expired. Timer manager
implementation based on unordered timer list is ineffective in both hardware and
software. Specifically, in a software implementation the time necessary to check all
timers for expiration would be proportional to the number of timers n, whereas a
hardware implementation would require the number of timers n comparators, occupying
large area in a chip.

4.2.2 Ordered Data Structures


This is an improvement to the aforementioned scheme, as the list of timers is maintained
ordered according to their expiration time. The sooner a timer expires, the higher it is
located in the list with the timer having the lowest value being on top of the list. Keeping
the list ordered reduces the number of searches needed to just one, but introduces the
delay of sorting it. Typical implementations of ordered data structures include doubly
linked lists and heaps. A doubly linked list requires O(n) time to be sorted; a heap
requires O(logLn), where L is the number of levels in the tree and n the number of timers
to be sorted.

4.2.3 Hashed Data Structures


Hashed data structures use a one-dimensional array in conjunction with multiple ordered
timer lists. The number of lists is equal to the number of elements in the array, having
each array element k on the top of the corresponding list Lk. A hashing function H is
used to map the timer expiration time to an array index i and to insert the timer in list Li.
Latency in this approach is governed by the computation time of the hash value
H (TimerExpirationTime).

4.2.4 Hierarchical Data Structures


Another method to improve latency is to divide the representation of system time into z
hierarchical components, and maintain an ordered list Lz for each component. The timer
manager inserts a timer into the component list it initially belongs to, checking timers in a
particular list if the component of that list has changed. An example of this approach is
the Logarithmic Timer Management Algorithm [12], according to which timers follow a
logarithmic organization. Specifically, timers below ten ticks belong to the ones category
and are decremented every tick by one tick. Timers below one hundred ticks are in the
tens category and are decremented every ten ticks by ten ticks until their value reaches
the ten ticks threshold and they are moved to the ones category. Similarly, timers below
one thousand ticks are in the hundreds category and are decremented every one

- 45 -
Institute for Microelectronics and
UMTS Gi Iinterfacing and Measuring System
Information Technology

hundred ticks by a hundred ticks until they drop below the one hundred ticks threshold
and moved to the tens category, etc.

4.3 Timer Management Hardware Architectures


As software timer managers became the bottleneck in overall system performance, the
next logical step was to consider specialized hardware architectures for handling timers.
Hardware timer managers attempt to accelerate execution of sorting algorithms,
deploying them in hardware. The main differentiation in the algorithmic level is that
hardware implementations introduce parallelism and pipelining in order to further reduce
execution times. Common characteristics amongst proposed solutions include the
implementation of a hardware-managed heap in main memory, rounding timer values
according to predefined thresholds and the use of Content Addressable Memories
(CAMs).

4.3.1 Heap-based
Heaps have dominated over other data structures due to their efficient sorting scheme.
In the case where heaps are utilized, all active timers are stored in a tree and the timer
having the smallest value is always located at the root of the tree. Therefore only O(1)
searches are required to find the timer that has expired, but O(logN) operations are
required to sort a heap containing N elements. Additionally, heap-based hardware
architectures deploy more than one main memories, organized in ways that increase the
level of parallelism during the heap sorting process.

An interesting memory organization scheme accompanied by the corresponding data


structures is presented in [13]. In [13], the children of a node are maintained in the same
memory device as an independent heap, where the root of the heap is the node’s lowest
value child. Essentially, the problem of sorting a heap with L levels whose nodes have C
children8 each, is “broken down” to sorting one of the

CL-2 x (1 + C)

heaps present in each level L (L > 1, since the 1st level always contains exactly one
heap). All nodes contained in a level except for the nodes residing on the last one, will
have C children organized as a binary heap, thus forming a mutation of the standard d-
heap. Additionally, the heap root does not only point to its 1st level children, but to its
corresponding 2nd level heap as well. For example, suppose we have 16 timers
organized in a modified d-heap with d = 3. Such a heap is depicted in (Figure 4-2).

The first level array would be the first heap and would contain 3 timers, in the form of a
sorted binary tree. The second level array would contain 12 timers, organized as 4
independent heaps of 3 timers each. The previously described architecture takes
advantage of the principle that maintaining a smaller number of nodes in a heap reduces
the time needed to sort the heap.

8
The number of children a heap node can have is called the degree d of the heap.

- 46 -
Dimitris Tsaimos Msc Thesis Royal Institute of Technology

2
1st level heap

3 4

5 8 11 14

2nd level heaps


6 7 9 10 12 13 15 16

Figure 4-2 Timers organized in a heap with degree d=3

4.3.2 CAM-based
Another category of architectures utilizes CAMs in order to perform simultaneous
comparisons. The hint behind this approach is that CAMs do not store current timer
values but their calculated expiration times instead. The system time T is applied as the
search data to the CAM during every system “tick” and compared against all timer
expiration values stored therein. In case a match is found, the corresponding timer event
needs to be triggered.

Finally, there are hierarchical architectures which follow the principle of rounding timer
values according to a predefined set of thresholds. A representative of this approach is
described in [14], where timer values in the range of 100 to 1000 are assumed and the
additional restriction of having a 10% accuracy in timer durations is imposed. Under
these limitations, [14] suggests the usage of the following series of timer durations

100, 120, 150, 180, 220, 270, 330, 390, 470, 560, 680, 820, 1000

An ordered list is maintained for each of the previous timer duration rounding values;
each element of the list contains the wakeup time of a particular timer (Figure 4-3).

During a system “tick”, current time is compared against the first element of each list,
resulting in timers expired yielding a match. Keeping in mind that all timers in the same
list have the same timer duration, the newly inserted timers will always be placed at the
end of the list. This settles the issue of sorting the list, since amongst timers which have
the same duration but were started at a different time, the one generated earlier will also
be the one to be served earlier within the timer duration window.

When this scheme is accompanied by a CAM, as described in [14], the CAM holds the
first element of every list in order for simultaneous comparisons to be feasible.
Furthermore, rounding timer durations is an improvement over the plain CAM usage
scenario described earlier, as it reduces the number of entries in the CAM. However, the

- 47 -
Institute for Microelectronics and
UMTS Gi Iinterfacing and Measuring System
Information Technology

series of timer durations should be chosen carefully to introduce as little inaccuracy as


possible to their actual value.

CAM

rounded timer wakeup time list element list element . . . . list element

rounded timer wakeup time list element list element . . . . list element

.
timer .
manager .
.
.
.

rounded timer wakeup time list element list element . . . . list element

Figure 4-3 CAM-based timer management

We should note though that all aforementioned schemes do not address the issue of
having multiple timers expire simultaneously and the way such a situation should be
handled. The particular issue imposes cumbersome restrictions on the timer manager
and requires special handling, especially in the case where timer management is based
on the computation of timer wakeup times, as we will explain in following subsections.

4.4 Priority Queuing and Timer Management


As seen from the above, timer management is treated as a sorting problem, where the
entities to be sorted are timer wakeup times. A similar issue is raised in the case of
packet classification or else priority queuing, used to provide fine-grained Quality of
Service (QoS) guarantees. In order for QoS to be supported in a networking device, i.e.
a router, packet transmission time must be decided on a priority basis. Priority queues
ensure that transmission of packets having more stringent QoS requirements takes
precedence over “less demanding” packets. The actual queues are in some cases
implemented as heaps and when this happens, priority queuing becomes similar to timer
wakeup time sorting; the heap element having the lowest priority value should always be
at the root of the heap and the heap should be resorted after a packet has been
scheduled for transmission. The aforementioned remark has encouraged us to perform a
theoretical study of hardware architectures related to priority queues heap management.

Packet classification incorporates many architectural solutions common to the ones


implemented in timer management. An interesting finding though was the use of special
mechanisms that facilitate the pipelining of operations to be performed on a heap [15],
[16], so as to further reduce the latency related to sorting it. Their basic principle is to
utilize special data structures and datapaths which increase the parallelism in the heap
sorting process. In order to support pipelining, these architectures require storing the N
levels of the heap in N memory devices, having each level reside in a different device.
Furthermore, additional hardware is needed for the N simultaneous comparisons of the

- 48 -
Dimitris Tsaimos Msc Thesis Royal Institute of Technology

sorting process to be performed. The aforementioned architectures base pipelining on


the fact that a new operation can be performed at each level as soon as the current
operation on that level has been processed. Each stage of the pipeline is dedicated to
performing operations on a particular level of the tree.

Finally, [17], [18], present sorting algorithms designed specifically for hardware
implementation. These algorithms are substantially differentiated than the traditional
approaches presented earlier. However, their complexity is high and especially in the
case of [17] the results obtained do not seem to justify it. Specifically, when N elements
need to be sorted the architecture requires N/2 comparators, whereas the sorting time is
O(N) - relatively high for a special-purpose sorting architecture. On the contrary, the
conclusions presented in [18] are quite interesting, as according to the writers, N
elements can be sorted in O(NlogN/plogp) time for all ranges of N using a p-sorter9.

4.5 Test Tool Timer Management


The test tool timer manager is responsible for triggering events that lead to packet
assembly and transmission by the Generator module. Whenever an active user timer
expires, a packet corresponding to the specific user is sent to the network under test.
The duties of the timer manager include

• examining active user timer values every tick,

• locating timers that have expired and notifying the Generator sub-modules of the
actual event that should be triggered,

• handling timer values.

Depending on the implementation, value handling is translated into the calculation of


timer wakeup times whenever they expire or decrementing their value every tick and
resetting them to their preset value once they expire.

For the purposes of the thesis project, three timer management architectures were
designed. The solutions were presented to Keletron and the final choice was based
upon Keletron requirements. This section includes an overview of the three timer
management hardware architectures developed.

4.5.1 Timer Wakeup Times vs. Counters


One critical design decision was whether to base timer management on counters,
implementing each timer as a counter which is decremented by one every tick, or on the
calculation of their wakeup time, where each timer wakeup time is stored in memory and
compared against the current simulation time. The basic disadvantage of the second
approach is the fact that a new timer wakeup time can only be estimated when the
current wakeup time has been reached and the related event has been triggered. In
case the timer timeout is not served for some reason, then the user could be “evicted”
from the system. To avoid such an undesirable side effect, all timers should always be

9
A p-sorter is a sorting device capable of sorting p elements in constant time.

- 49 -
Institute for Microelectronics and
UMTS Gi Iinterfacing and Measuring System
Information Technology

checked for expiration in every tick, a quite cumbersome and error-intolerant


requirement.

Alternatively there could be mechanisms to overcome this situation, i.e. having a “time
window”. All timer values falling within the window could be served as if they had expired
or their expiration could be ignored, having the timer manager calculate their new
wakeup time. In either case, a level of complexity would be introduced in the timer
manager, complicating the whole procedure and the actual implementation. On the
contrary, the first approach is far more simplistic, since the timer manager can
decrement each counter, check it for expiration and then store it back to memory. If the
timer manager cannot decrease all timers it is in charge of, the system does not “evict”
users; it simply introduces inaccuracy to the non-decreased timers.

4.5.2 Distributed Timer Management


The first approach is straightforward and quite simple to implement. It is based on an
unordered data structure and works as following. Suppose we have t timers. The t timers
are organized in g groups, each group Gk containing

Tk = t/g

timers. Each Gk group timers are stored in their own private memory, to which the timer
manager has exclusive access (Figure 4-4). The Tk timers of a group are not maintained
in a sorted or hierarchical manner. The timer manager responsible for the Tk timers
starts fetching timers from its memory at the beginning of each tick. Each timer is
decremented by one and stored back to memory. In case a timer value is equal to zero,
then the timer has expired and the corresponding event is triggered. The timer expired is
then loaded with its preset value and stored back to memory.

RAM RAM

Tk Tk
timer . . . timer
timers timers
list manager manager list

. .
. .
. .
RAM RAM
. .
Tk Tk
timer timer
timers
manager
. . . timers
list manager list

Figure 4-4 Distributed timer management architecture

The aforementioned scenario satisfies the requirement that at least one event can be
served within a tick. Additionally, deploying multiple parallel engines apart from

- 50 -
Dimitris Tsaimos Msc Thesis Royal Institute of Technology

narrowing down the problem of checking t timers to checking Tk timers, has the
advantage of enabling the Generator to serve at least g simultaneous timer expirations
per tick, provided that a timer manager can examine all timers it is responsible for within
a tick and that simultaneously timers expired reside in different timer groups.

The disadvantage of using multiple parallel engines is the increase in hardware


resources utilization. Taking in consideration the requirement to store each timer group
Gk in a different memory device yields a total of g devices needed for implementation.
The actual number of devices needed is analogous to the number of timer groups g. On
the other hand, the more the timer groups the less the timer inaccuracy introduced, as
each timer manager will be responsible for a smaller number of timers, thus having more
time to examine all of them for expiration.

4.5.3 Distributed Heaps Timer Management


This approach resembles the first one in many ways and was inspired by the memory
organization used in [13], with the difference that all heap levels are stored in the same
memory, to reduce the number of memory devices used - recall that in [13] each level
was stored in a different memory device. Again multiple autonomous parallel timer
managers are utilized. The exact number of timer managers deployed ensures that
timers are organized in heaps in the same manner we presented earlier, where each
heap node contains an identical number of children to every other node and the heap
node children form a balanced binary tree10.

For the sake of our discussion and for clarity issues, we will present an example based
on the 4096 timers the test tool timer manager should be able to support. In the
particular case, a total of 64 timer managers could be used, where each timer manager
handles 64 users. Using 64 timers per group allows for timers to be organized in a
modified d-heap with d = 7, forming a tree with 2 levels. Each node has 7 children, which
are internally organized as a binary heap with the smallest timer value always being the
root of the tree (Figure 4-5). Each one of the 1st level binary tree nodes points to its
children residing on the 2nd level binary tree. Such a memory organization minimizes the
time needed to sort the tree.

The result of having timers sorted is that finding timers expired requires a single clock
cycle. On the other hand, keeping timers sorted adds the overhead of resorting timer
heaps each time a timer expires or a new timer is inserted in the tree. Furthermore, there
is the issue of preserving the heap properties. It is known that the basic property of a
heap is that nodes children always have a value less than their father [19]. This property
prevents us from maintaining simultaneously expiring timers in a heap and furthermore
adds the requirement that whenever two such timers exist, a “conflict resolution” scheme
should differentiate the value of the two timers. Additionally, conflict resolution should be
fair, in the sense that if a timer conflicts with another timer and its value is changed in
some way, then if the new timer value conflicts with another timer value, the second
timer value should be altered, so as not to introduce excessive latency to the first timer.
Conflict resolution comes at a cost; the side-effect is the introduction of inaccuracy to
timer durations.

10
A tree is balanced when each subtree of the root has an equal number of nodes. Put differently, all
balanced tree nodes have the same number of children.

- 51 -
Institute for Microelectronics and
UMTS Gi Iinterfacing and Measuring System
Information Technology

Two operations are supported on the heap

• Insert.

Inserts a new timer to the heap and places it into the appropriate position. Initially the
newly inserted timer is placed in the leftmost empty position of the tree and
according to its value, it is shifted upwards until it meets a node with a value less
than its own. Insert operations occur when a new timers is inserted into the heap, as
well as when a timer expires, in which case it is reloaded with its preset value and
reinserted into the heap.

• Delete.

Deletes a timer from the heap. This occurs whenever a timer expires, in which case it
is removed from the tree and the timer with the smallest value amongst timers
residing in the heap must be placed as the new root of the tree.

The algorithm designed for distributed heaps timer management searches for and
resolves timer duration conflicts during both operations. The conflict resolution scheme
of the insert operation consists of simply incrementing the timer value of the timer
inserted in the heap by one, so as to give the already present timer precedence over the
newly inserted one. In delete operations, conflict resolution is based on a Round Robin
algorithm, according to which if two children nodes have the same timer value, they are
decremented alternatively.

3 4 1st level heap


contains root children

5 6 7 8

9 58

2nd level heaps


10 11 . . . . . . 59 60 contain 1st level
nodes children

12 13 14 15 61 62 63 64

Figure 4-5 Timer manager heap organization

- 52 -
Dimitris Tsaimos Msc Thesis Royal Institute of Technology

For example, if nodes 10, 11 of the 2nd level binary tree have the same value, then
initially node 10 value will be decremented by one and node 10 will be shifted upwards.
The next time nodes 10, 11 have the same timer duration, node 11 will be decremented
by one and shifted upwards. Note that the specific conflict resolution scheme maintains
the heap property, as the timer duration decremented always is the one located on a
father node and therefore still remains higher than any of its children timer duration
values.

One important point to further analyze is the insert and delete operations performed on
the heap. During the insert operation, the newly inserted timer is placed at the leftmost
empty position of the binary tree residing in the 2nd level and shifted upwards. In case
the timer has a value greater than that of its father - residing on the 1st level binary tree -
the timer is shifted upwards to the first level of the heap. If the timer is further shifted up
in the 1st level binary tree and exchanges positions with another 1st level tree timer value,
it still points to the 2nd level tree into which it was initially inserted. If this was not the case
and timers exchanged positions in the 1st level heap, then the timer with which the newly
inserted timer would exchange positions, would become the new father of the 2nd level
heap and a possible 2nd level tree resorting would be needed. This would happen due to
the fact that the new father of the 2nd level tree could have a value bigger than that of the
root of the 2nd level tree; therefore, the two nodes should exchange places and the initial
father of the 2nd level tree should be shifted down to the 2nd level tree to its appropriate
position. To avoid an analogous implication during the delete operation, the root of the
2nd level tree to which the root of the 1st level tree was pointing to, is shifted upwards to
the 1st level tree and placed at the appropriate position, always pointing to the 2nd level
tree from which it was originated.

Using n parallel heaps allows at least n simultaneously expired timers to be served,


provided that they reside in different memory devices and that each timer manager is
able to perform a delete and insert operation within a tick, but requires n memory
devices due to having each heap stored in its own device. Last but not least, let us
mention that the example presented throughout this section was an ideal case where the
heaps formed were governed by properties allowing a perfectly balanced heap
organization. This does not imply that the heap must be balanced to apply the
aforementioned algorithm. However, the delay related to sorting the heap will be
analogous to the heap levels.

4.5.4 CAM-based Timer Management


As mentioned earlier, the use of Content Addressable Memories results in a simple and
straightforward timer manager implementation, based on the computation of timer
wakeup times. The wakeup times of all timers present in the system are stored in a CAM
and current simulation time is compared against the contents of a CAM in every tick. If a
match is found, the corresponding event is triggered.

Keletron test tool incorporates a CAM which is currently not being utilized, though plans
for future use do exist. As a result of that, a timer management solution based on CAMs
was meaningful. However, during the study of the features of the particular CAM used in
the system, the findings were discouraging. First of all, the CAM does not support
multiple matches and therefore restricts having two entries with the same value inside

- 53 -
Institute for Microelectronics and
UMTS Gi Iinterfacing and Measuring System
Information Technology

the CAM. As a result of that, simultaneously expiring timers cannot be collocated in the
CAM. One solution to this problem would be to make sure that no two timer wakeup
times have the same value, i.e. by incrementing the value of the newly inserted timer
wakeup time by one. Nevertheless, such an approach could lead to increased
complexity and timer inaccuracy. Consider the case of having n timers with the same
wakeup time T. For the nth timer to be inserted in the CAM, n comparisons would have to
be performed to find if all values between T and T + n are already occupied. To make
things even worse, consider the possibility of other timer wakeup times between T and
T+n to have already been inserted. Even if the n values are small - in the range of 20 to
50 - the previously described situation significantly complicates the timer manager
implementation.

What makes the use of the specific CAM intrinsically prohibitive is the fact that it stores
its values from the lower one at the top of the table to the higher one at the bottom of the
table. Consequently, each time a new timer wakeup time would be inserted, the CAM
should reorder its contents. The CAM datasheet [20] describes and analyzes the worst-
case scenario for the aforementioned situation. The worst-case scenario yields a value
of approximately 80µs for a CAM insert operation, which is clearly unacceptable.
However, new CAM generations, such as the ones offered by MUSIC semiconductors
[21] and MOSAID Technologies [22], support multiple matching operations and do not
require data to be stored in a sorted manner.

4.5.5 Timer Manager Architecture


The hardware architectures described earlier were presented to Keletron, which finally
decided to base the test tool timer manager on the distributed timer management
approach. Though as we mentioned earlier a solution utilizing unordered data structures
is in general considered inefficient in terms of both hardware and software
implementation, it was considered satisfactory for the test tool for two basic reasons

• simplicity,

• relatively low hardware utilization.

The simplicity of the particular approach resulted in a fast and straightforward


implementation, having easily predicted behavior. Additionally, the architecture took
advantage of the fact that the timer manager should perform all timer related functions
not within a single clock cycle, but within a tick. Therefore, the number of actual timer
manager modules deployed needed not be equal to the number of timers. The original
constraint was that each timer manager module should be able to examine the whole list
of timers it was in charge of within a tick.

Nevertheless, a solution based on heaps would probably scale better for a larger
number of timers and will be considered in future versions of the test tool, which shall
support more users. Heaps provide faster matching times, but introduce the complexity
of resorting. On the other hand, the heap-based solution proposed by the thesis would
require additional effort for designing the appropriate data structures and implementing
the heap. Moreover, as we shall see in chapter 7, the solution adopted was proven more
than adequate for the needs of the hardware design.

- 54 -
Dimitris Tsaimos Msc Thesis Royal Institute of Technology

5. Field Programmable Gate Arrays


5.1 Test Equipment Hardware Flexibility
The Keletron test tool is able to emulate the Iu-PS, Iu-CS, Gn and Gi interfaces of the
UMTS network. In chapter 2, we have presented the protocol stacks of the particular
interfaces and their data transfer mechanisms; Asynchronous Transfer Mode and
Ethernet. Each data transfer mechanism requires different circuitry to handle the
corresponding layer one and two functions, as well as glue logic for the communication
of hardware performing higher layer processing with the circuitry in use. Additionally,
higher layer processing functions should be flexible and configurable, allowing the test
tool to adapt to evolving standards and/or enhance currently provided functionality
without significant modifications in its hardware architecture. The test tool incorporates
the flexibility required based on

• the isolation of the test tool master card from layer one and two functions, by
means of the rear transition modules,

• the Field Programmable Gate Arrays present on the master board.

Field Programmable Gate Arrays (FPGAs) are reconfigurable hardware, which can be
tailored to the needs of the target application, thus avoiding the design of different
hardware versions for each particular application. The test tool FPGAs contain the logic
required to construct and transmit data and signaling packets, whose format is based on
the protocol stack emulated. Furthermore, they are used to implement the glue logic
needed by the master board to interface with rear transition modules.

This chapter provides an overview of FPGAs architecture, focusing on their building


elements and the way they are used to realize user-defined logic. Moreover, a
description of the hardware resources present on the master board FPGAs is included to
outline the limitations they impose on the Generator and Analyzer hardware design.

5.2 Field Programmable Gate Arrays


Field Programmable Gate Arrays (FPGAs) are digital integrated circuits containing
relatively simple programmable blocks of logic surrounded by programmable
interconnects. They were introduced to fill the gap between Application Specific
Integrated Circuits (ASICs) and Programmable Logic Devices (PLDs). On the one hand,
ASICs can support large and complex functions, but are time-consuming to design. Their
cost is high, and most importantly once an ASIC is fabricated, its functionality can no
longer be extended - at least not without using external complementary circuitry. On the
other hand, PLDs are highly reconfigurable and reduce design time, but they can not
support large, complex functions. FPGAs scope was to combine ASICs capabilities with
PLDs ease of configuration.

5.2.1 FPGA architectures

- 55 -
Institute for Microelectronics and
UMTS Gi Iinterfacing and Measuring System
Information Technology

FPGA architectures can be divided into two major categories, based on the structure of
the logic blocks they incorporate [23]

• fine-grained,

• coarse-grained.

Fine-grained FPGA logic blocks can only be used to implement very simple functions,
i.e. a 3-input AND gate or a storage element, whereas coarse-grained architectures
blocks include a larger amount of logic, allowing them to offer more sophisticated
functionality. The differentiation in logic blocks structure introduces an additional
advantage for coarse-grained architectures, enabling them to deploy less connections
to/from the blocks. Less connections result in reduced signal propagation delays in
comparison to fine-grained FPGAs, whose logic blocks are interconnected using a
relatively high number of tracks. The particular feature was the major reason that
engineers eventually converged to coarse-grained architectures.

5.3 FPGAs Building Modules


To further continue our discussion we need to establish a common ground for FPGA-
related terms used, since each vendor has established his own terminology for the
building elements of his FPGA architecture. Due to the fact that throughout the specific
project Xilinx FPGAs were utilized, Xilinx terminology shall be used. Nevertheless, the
key concepts of FPGAs hierarchical organization to be presented apply to all coarse-
grained FPGAs.

5.3.1 Logic Blocks


As we pointed out FPGAs basic elements are logic blocks and programmable
interconnects. Each logic block consists of a number of function implementation units,
which contain the primitive building elements of the FPGA. Xilinx refers to such a block
as Configurable Logic Block (CLB) and to the corresponding function implementation
unit as Slice. A Slice comprises the following primitive building elements.

• Look-Up Tables (LUTs),

• multiplexers,

• flip-flops,

• fast carry logic.

LUTs act as function generators, capable of implementing any arbitrarily defined boolean
function. They have a specific number of inputs - 4 in Xilinx FPGAs - and contain all
possible outputs of the boolean function. A particular output is selectable by the
corresponding combination of inputs. LUTs operation is best demonstrated using an
example. Suppose we wish to implement the following logic function using a LUT
(Figure 5-1).

- 56 -
Dimitris Tsaimos Msc Thesis Royal Institute of Technology

y=x|z|w

The LUT contains all possible function outputs, whereas inputs are applied as select
signals to an 8:1 multiplexer in order to select the correct output.

Furthermore, many FPGA vendors extend LUTs functionality, allowing them to be


configured as a small block of Random Access Memory (RAM) or a shift register, apart
from their primary role as lookup tables. Multiplexers enable the Slice to implement more
complex functions with a larger number of inputs than that supported by a single LUT.
Finally, the fast carry logic facilitates effective implementations of arithmetic operations
and counters.

logic function truth table LUT

x z w y 0
x 1
z y 0 0 0 0 8:1 multiplexer
w 1 S 1 D
0 0 1 1
1
0 1 0 1 S8

1
0 1 1 1 1 C1 C2 C3 ENB

1 0 0 1 1
1
1 0 1 1

1 1 0 1

1 1 1 1
x z w

Figure 5-1 LUT logic function implementation

As we can see from the above, FPGAs are internally organized in a hierarchical way,
where logic blocks are the topmost elements. The hierarchical architecture aims at
providing fast interconnects between building blocks, having FPGA interconnections built
on a similar hierarchy. Slices internal interconnections are faster than the ones between
slices belonging to the same CLB. In the same fashion, interconnections between CLBs
are slower than the ones between slices. When compared to a flat routing hierarchy -
where a slice should access routing resources exterior to the CLB to interconnect to
another slice within the same CLB - the aforementioned scheme offers significant
improvements in terms of signal propagation speed.

5.3.2 Embedded RAM and Multipliers


Apart from CLBs, FPGAs have gradually introduced additional hardware resources to
enhance their functionality. Current FPGAs include RAM memory embedded in the CLB
array, distributed in autonomous, configurable units referred to as block RAMs.
Depending on the FPGA architecture, the block RAMs can be positioned in the periphery

- 57 -
Institute for Microelectronics and
UMTS Gi Iinterfacing and Measuring System
Information Technology

of the device, distributed inside the chip in relative isolation or organized in columns.
Each block of RAM can be configured to operate in various widths and depths.
Furthermore, it can be used as stand-alone memory or in conjunction with additional
block RAMs to realize larger blocks of storage. Common uses of such blocks include,
but are not confined to, single-port or dual-port RAMs and First-In First-Out (FIFO)
queues.

FPGAs also incorporate hardwired multiplier blocks, since many applications necessitate
multipliers utilization - particularly those related to Digital Signal Processing. An
additional reason for deploying dedicated multiplication hardware is the fact that a
multiplier implementation based on the interconnection of programmable logic blocks
would be inherently slow. Multiplier blocks are usually located in close proximity to block
RAMs, as it is common practice to combine functionality offered by both resources;
specifically to store multiplier inputs in block RAMs.

5.3.3 Clocking Scheme


All synchronous elements within the FPGA need to be driven by a clock signal, which is
usually supplied via a special clock input pin. The clock signal must be routed through
the device and connected to the appropriate synchronous elements. If no special
clocking circuitry were to be used, the clock signal would be distributed as a single long
track to all FPGA flip-flops one after the other. The aforementioned situation would lead
to a phenomenon known as clock skew, where flip-flops “see” the clock phase-shifted by
a certain amount depending on their relative position in the clock track. The flip-flop
closest to the clock pin sees the clock signal much sooner than the one at the end of the
track.
flip flop

SE
T SE
T SE
T SE
T
S Q S Q S Q S Q

R CL
R Q R CL
R Q R CL
R Q R CL
R Q

SE
T SE
T SE
T SE
T
S Q S Q S Q S Q

R CL
R Q R CL
R Q R CL
R Q R CL
R Q

clock tree

SE
T SE
T SE
T SE
T
S Q S Q S Q S Q

R CL
R Q R CL
R Q R CL
R Q R CL
R Q

SE
T SE
T SE
T SE
T
S Q S Q S Q S Q

R CL
R Q R CL
R Q R CL
R Q R CL
R Q

FPGA
external clock
signal
Figure 5-2 Clock tree example

To avoid clock skew FPGAs deploy what is known as clock tree. A clock tree is
implemented using special tracks, separated from the general-purpose interconnect.
Inside the clock tree, the main clock signal branches again and again to ensure that all

- 58 -
Dimitris Tsaimos Msc Thesis Royal Institute of Technology

flip-flops see their versions of the clock signal as close together as possible. FPGAs
commonly have multiple clock input pins and deploy multiple clock trees, each forming a
clock domain. A very simplistic clock tree is depicted in (Figure 5-2).

5.3.4 Clock Managers


A clock input can be either directly routed to a clock tree or fed as input to special clock
management modules, the clock managers. Clock managers are driven by the chip’s
clock input pins and generate a number of daughter clocks, which drive internal clock
trees or output pins used to provide clocking signals to other chips on the board.
Additionally, clock managers perform a number of clock signal manipulation tasks the
most important of which are

• frequency synthesis,

• phase shifting,

• clock de-skew,

• jitter removal.

The daughter clocks produced can be delayed with respect to the input clock by means
of the phase-shifting processing. Usually clock managers allow for a selection between
predefined values of phase shifts such as 90o, 180o and 270o, but there also are clock
managers that allow for configuration of the exact amount of phase shift required for
each daughter clock. Frequency synthesis refers to the clock manager capability to
generate daughter clocks within a wide range of output clock frequencies, derived by
multiplication or division of the original clock input (Figure 5-3).

Figure 5-3 Clock manager frequency synthesis and phase shifting

Clock de-skew is the procedure of monitoring and correcting daughter clocks by phase-
aligning them to the input clock. The daughter clocks generated by the clock manager
are relatively delayed compared to the input clock, primarily due to clock manager
processing and interconnect delays inherent in clock distribution throughout the board.
As a result of that, the daughter clock may lag behind the input clock by some amount
leading to what we referred to earlier as clock skew. Finally, clock managers are used to
detect and correct jitter. The term jitter refers to short-term variations of signal transitions
from their ideal position in time. Practically speaking, external clock edges may not arrive

- 59 -
Institute for Microelectronics and
UMTS Gi Iinterfacing and Measuring System
Information Technology

in the FPGA clock input pad at the exact time they should arrive, but a little earlier or
later instead, resulting in a fuzzy clock signal being fed to the FPGA (Figure 5-4).

Figure 5-4 Clock de-skewing and jitter removal

5.3.5 General-purpose I/O Pins


FPGAs provide a large number of pins for Input/Output (I/O), which are organized in the
periphery of the device as I/O banks. An I/O bank contains a standard number of I/O
pins and can be handled independently from any other bank. I/O banks are configurable
so that they can conform to a number of I/O signaling standards - i.e. the Peripheral
Component Interface specification - whereas each I/O bank can be configured
individually to support a particular standard. The aforementioned approach is the most
beneficial one both for engineers and FPGA vendors, as it would make no sense to have
FPGA chips supporting a single standard or each I/O bank supporting a particular one.
The former would lead to a large number of variations of the same chip and would force
engineers to deploy multiple FPGAs if there was need to incorporate various standards
in their design, whereas the latter would lead to a large number of unused I/O pins
wherever a single standard would be used.

5.3.6 Gigabit Transceivers

The traditional means to move data between hardware devices has been the bus. A bus
consists of a number of adjacent tracks on the board having the same electrical
characteristics. As devices operating speeds increased there was a need for higher
throughput chips intercommunication mechanisms, leading to the design and
deployment of wider and faster buses. However, bus deployment complicates printed
circuit board design, as tracks occupy more space on the board and introduce signal
integrity issues. The previous remark has lead FPGA vendors to include some form of
gigabit transceiver blocks in their FPGAs. A gigabit transceiver uses one pair of
differential signals11 to transmit and receive data. Gigabit transceivers blocks include a
collection of gigabit transceivers, leading to higher data transfer speeds and saving
board space.

11
A pair of differential signals is a pair of signals always carrying opposite logical values.

- 60 -
Dimitris Tsaimos Msc Thesis Royal Institute of Technology

5.3.7 Embedded Processor Cores


Originally, microprocessors were a discrete device on the board communicating with
FPGAs through their generic interfaces. However, FPGA vendors decided to go a step
further and start embedding microprocessor cores inside the FPGA. This approach
provides a number of advantages, the obvious ones being the elimination of additional
tracks and pads on the board, reduction of interconnect delays - on-chip communication
is faster than inter-chip communication - and finally space saving on the board due to the
fact that there is no need for a separate microprocessor device. There are two trends for
embedding microprocessors into FPGAs

• physically integrating the microprocessor into the chip,

• configuring a group of programmable logic blocks to act as a microprocessor.

The former is called a hard core, whereas the latter soft core. Hard cores can be located
in a strip - referred to as the stripe - to the side of the FPGA along with dedicated RAM
and I/O circuitry, or directly inside the FPGA. Note than more than one cores may be
present in the FPGA. Soft cores on the other hand are slower and more primitive than
hard cores, but retain the advantage of being optional. An engineer can only use a soft
core when he needs to and he can create as many instances as necessary. As usual,
the final decision for using hard or soft cores relies on the needs of the hardware design.

5.3.8 Programmable Interconnects


The FPGA is in essence a collection of processing and storage elements, placed inside
a “sea” of programmable interconnects. Interconnects follow the hierarchical
organization of CLBs to provide signal propagation delays analogous to the proximity of
the elements being interconnected. To achieve such functionality, a significant area of
the FPGA is covered with tracks. Tracks go through switching elements, programmed to
form the actual interconnection path. The common practice is to include local
interconnects within the same slice and interconnects between two slices and between
CLBs that go through the switching elements. Last but not least, there are global
interconnection paths to transport signals across the chip without having to go through
multiple switching elements.

5.4 Xilinx Virtex-II FPGAs


Having provided a description of the general FPGA architecture, we will present an
overview of the Xilinx Virtex-II Field Programmable Gate Array (FPGA) family of devices.
[24]. The test tool includes two such devices, which contain hardware logic required to
assemble and transmit data and signaling packets, as well as parse packets received
and perform statistics gathering. Throughout this section we will describe the resources
Virtex-II FPGAs incorporate, concluding with a detailed enumeration of the actual test
tool Virtex-II FPGAs resources.

5.4.1 Configurable Logic Blocks

- 61 -
Institute for Microelectronics and
UMTS Gi Iinterfacing and Measuring System
Information Technology

The basic logic block of Xilinx FPGAs is the Configurable Logic Block. CLBs are
organized in a symmetrical array and are used to build the hardware design. A CLB
consists of 4 identical slices, which are organized in two columns of two slices each. A
slice contains two 4-input Look-Up Tables (LUTs), wide function multiplexers, storage
elements, arithmetic logic gates and carry logic. LUTs are programmable as 4-input
function generators, 16 bits of distributed RAM, or a 16-bit shift register.

When configured as RAM memory, LUTs within the CLB can implement 16x8, 32x4,
64x2 and 128x1 single-port RAM or 16x4, 32x2, 64x1 dual-port RAM. When configured
as shift registers, LUTs can implement a shift register up to 128 bits wide within the CLB.
Multiplexers can be combined with function generators to produce any logic function with
more than 4 inputs or shift registers longer than 16 bits. Arithmetic logic gates and carry
logic facilitate the implementation of fast arithmetic addition and subtraction. Finally,
CLBs contain tri-state buffers used to drive on-chip busses.

5.4.2 Block SelectRAM


Virtex-II devices incorporate large blocks of distributed RAM memory, to which Xilinx
refers as block SelectRAM. Each block SelectRAM is an 18Kbit RAM region, containing
two independently clocked and controlled ports; it can be configured to operate as single
or dual-port RAM of various depths and widths (Table 5-1).

Table 5-1 Block SelectRAM single and dual-port configurations

depth (bits) width (bits)


16K 1
8K 2
4K 4
2K 9
1K 18
512 36

The advantage of 9-bits, 18-bits and 36-bits widths is the ability to store a parity bit for
each 8 data bits; in the aforementioned configurations the actual memory width can be
seen as 8+1, 16+2 or 32+4 bits. However, parity bits generation and verification is
entirely delegated to user-defined logic. As a dual-port RAM, each port has access to a
common 18Kbit memory resource. The control signals of each port are independent and
each port data width can be configured autonomously. Block SelectRAM memory are
organized in either four or six columns. The number of blocks per column depends on
the Virtex-II device array size and is equal to the number of CLBs present in a column
divided by four.

5.4.3 Multipliers
Multipliers are also organized in blocks, where each multiplier block is associated with a
block SelectRAM. The multiplier block is a dedicated 18x18-bit multiplier, optimized for

- 62 -
Dimitris Tsaimos Msc Thesis Royal Institute of Technology

operations based on inputs from the block SelectRAM on one port. Nevertheless, it can
be used independently of the block SelectRAM.

5.4.4 Clocking
Virtex-II devices have 16 clock input pins that can also be used as regular user
input/output pins. Clock management is facilitated by Global Clock Multiplexer Buffers
and Digital Clock Managers (DCMs). Global clock buffers are used to distribute the clock
to some or all synchronous logic elements; they represent the input to dedicated low-
skew clock tree distribution circuitry present in Virtex-II devices. Such a buffer can either
be driven directly by an input clock pad or by a DCM. The DCM is responsible for clock
management tasks described earlier and specifically for clock de-skew, frequency
synthesis and phase shifting. DCMs are placed on the top and bottom of each block
SelectRAM and multiplier column. Global clocks are complemented by local clock
resources, which can be used for a number of applications such as the implementation
of Double Data Rate12 SDRAM interfaces.

5.4.5 Input/Output Blocks


Input/Output Blocks (IOBs) provide general-purpose I/O functionality to the Xilinx
FPGAs. They are located on the perimeter of the device in groups of two or four and
constitute the interface of the FPGA to external devices. Each IOB contains one pad
accompanied by configurable storage elements, used to implement memory interfaces,
I/O standards and differential signaling standards. IOBs support a total of 19 different I/O
standards; their mode of operation is programmable, enabling them to act as inputs,
outputs or both. Differential pairs are always implemented using two adjacent IOBs.

5.4.6 Routing Resources


Routing in Virtex-II FPGAs is based on Xilinx Active Interconnect Technology, which in
essence is a fully programmable signal routing matrix (Figure 5-5).
tracks

switch switch switch switch


IOB IOB IOB DCM
matrix matrix matrix matrix

switch switch switch switch


IOB CLB CLB
matrix matrix matrix matrix
SelectRAM

Multiplier

switch switch switch switch


IOB CLB CLB
matrix matrix matrix matrix

switch switch switch switch


IOB CLB CLB
matrix matrix matrix matrix

Figure 5-5 Virtex-II routing resources

12
Double Data Rate memories are able to transfer data twice within a clock cycle, both on the rising and
falling edge of the clock signal, thus doubling memory bandwidth.

- 63 -
Institute for Microelectronics and
UMTS Gi Iinterfacing and Measuring System
Information Technology

The routing matrix consists of switching elements the switch matrixes. All FPGA
resources - CLBs, IOBs, block SelectRAMs, multipliers and DCMs - access global
routing resources via an identical switch matrix. We can think of the Virtex-II device as
an array of logic blocks attached to switch matrixes (Figure 5-5).

long lines
switch switch switch switch
matrix matrix matrix matrix

hex lines
switch
matrix

double lines
switch
matrix

slice

switch matrix
slice
direct fast
switch
connections matrix connects
slice

slice

Figure 5-6 Xilinx hierarchical routing resources

Virtex-II devices contain a total of 16 global clock lines, 24 lines per row or column of the
switch matrix, as well as secondary and local routing resources to provide fast
interconnections (Figure 5-6). The routing resources between any two adjacent switch
matrix rows or columns form a routing hierarchy. Global routing resources include the 24
long lines per row or column. Fast routing resources include 120 horizontal and 120
vertical hex lines. Hex lines route signals to every third or sixth block away in all
directions. Additionally, there are 40 horizontal and 40 vertical double lines, which route
signals to every first or second block away in all four directions. Each CLB has 16 direct
connections, routing signals to neighboring CLBs. Finally, local interconnects consist of
8 direct connect lines that connect LUT outputs to LUT inputs within the same CLB.

5.5 Test Tool Virtex-II Device


The Virtex-II family of FPGAs includes a number of members, differentiated by the
hardware resources they contain, in order to provide various cost/features combinations.
All devices incorporate the building elements described previously - IOBs, block
SelectRAMs, multipliers, CLBs, DCMs and switch matrixes; they are distinguished by the
actual element quantities they deploy. When implementing a system in a FPGA, the
resources available impose restrictions on the system hardware architecture and should
be taken into consideration throughout the design process. The particular Virtex-II device
used in the test tool contains a total of 484 I/O pins, 3584 CLBs, 96 multipliers and 96

- 64 -
Dimitris Tsaimos Msc Thesis Royal Institute of Technology

block SelectRAMs. In chapter 6, we shall describe the way the Generator and Analyzer
architectures utilize these resources to realize the Gi interfacing and measuring system.

- 65 -
Institute for Microelectronics and
UMTS Gi Iinterfacing and Measuring System
Information Technology

6. Gi Interfacing and Measuring


6.1 OSI Reference Model, Encapsulation and
Decapsulation
The Open System Interconnection reference model13 consists of seven layers, each
specifying a subset of network functions. Its upper layers deal with application issues,
whereas the lower ones handle data transport. The purpose of the OSI model is to
define a framework for inter-computer communication. However, it does not provide
communication methods; the actual communication is realized by means of protocols,
having a protocol implement one or more layers functions.

The seven OSI layers use various forms of control information - defined by the layer
protocol and appended to upper layer data - to communicate with their peer layers
residing in other computer systems. Control information may be appended either to the
beginning or the end of higher layer data in order to form a packet, the autonomous unit
of information transmission. Control data at the beginning of the packet is the header,
whereas data appended to the end of higher layer data is the trailer.

Encapsulation is the process of wrapping higher layer data into the current layer packet
format. Practically speaking, each layer receives higher layer packets and encloses
them to particular layer packets, adding the corresponding headers and trailers.
Decapsulation is the process of removing current layer headers and trailers. Each layer
in the source system adds packet headers to data, whereas the peer layer in the
destination system analyzes and removes packet headers from that data.

layer x + 1 data layer x + 1

layer x header layer x + 1 data layer x trailer layer x

layer x-1 layer x-1


layer x header layer x + 1 data layer x trailer
header trailer layer x - 1

layer x data

Figure 6-1 The concept of encapsulation

Encapsulation enables the Generator to assemble valid packets. The test tool software
constructs packet headers for a particular user, calculating appropriate values for each
header field. The Generator simply fetches packet headers for a user depending on the
protocol stack his User Profile dictates, and places them on top of each other, with the

13
The model was developed by the International Organization for Standardization (ISO) in 1984.

- 66 -
Dimitris Tsaimos Msc Thesis Royal Institute of Technology

lower layer header as the outermost one. The Analyzer on the other hand uses
decapsulation to isolate and parse packet headers.

6.2 Gi Interfacing and Measuring System


The Gi interfacing and measuring system is used to perform conformance and stress
testing over the Gi interface for a Gateway GPRS Support Node (GGSN). It emulates
the behavior of a PDN towards the GGSN; the GGSN is considered as a router with
which data and/or signaling traffic is exchanged. The system offers the functionality
desired by means of two hardware modules

• The Generator,

• the Analyzer.

The Generator is responsible for generating traffic towards the GGSN, whereas the
Analyzer retrieves packets received from the GGSN, parses them and gathers statistics
based on their header information. Traffic generation is realized as the process of
assembling “virtual” user packets. For each “virtual” user to be activated during
simulation, the test tool software constructs packet headers, which are downloaded to
master board memory. The actual protocol stack for each active user is configured by
the operator by means of User Profiles (UPs).

transmit traffic event

user profile
traffic profile

protocol stack, packet length,


transmission information timer duration

Figure 6-2 Transmit traffic events, user profiles and traffic profiles

Recall from chapter 4 that each active user is associated with a TP and a UP, leading to
the creation of transmit Traffic Events (TEs). Transmit TEs accurately describe the
packet that should be sent on behalf of a particular user, from the protocol headers it
should comprise to its payload length. A transmit TE is triggered whenever the timer
contained in the user TP expires. The Generator then assembles the user packet,
utilizing information contained in the corresponding transmit TE to collect the necessary
packet headers residing in test tool memory. The packet assembled is complemented
with “dummy” payload and handed over for transmission. In the ingress side, the
Analyzer retrieves packets received and collects statistics on a protocol basis, i.e. the
number of packets containing a specific protocol header.

Packet transmission and reception is delegated to the Ethernet rear transition module,
which implements layers one and two functionality; meanly encapsulating into Ethernet

- 67 -
Institute for Microelectronics and
UMTS Gi Iinterfacing and Measuring System
Information Technology

frames and transmitting outgoing packets as well as receiving and error-checking


incoming packets. The rear transition module includes two devices, the IXF440 Ethernet
controller and the transceiver. IXF440 performs data link layer functions - or else MAC
processing - and the transceiver handles physical layer functions - the packet
transmission as a bit stream onto the wire. Both the Generator and the Analyzer
interface to the IXF440 chip by means of the IXBus and control signals. The IXBus is
configured in split mode ensuring two autonomous 32-bit data paths, one for each
module. Control signals regulate access to the IXF440 transmit and receive FIFOs.

Generator and Analyzer logic is implemented in the Virtex-II devices present on the
board, having each module reside in a different FPGA. In general the hardware
comprises the logic necessary to interface to the board memory and the IXF440 chip,
and perform each module internal tasks.

6.2.1 Configuration and Initialization


In chapter 3 we provided an overview of the test tool configuration process. This
subsection focuses on the Gi system-specific configuration parameters, with respect to
the protocol stack used over the Gi interface.

The test tool operator creates TPs, having each TP describe the traffic characteristics of
a user flow. TPs are configured in terms of packet size and data rate. The operator is
called to fill in the above taking in consideration the limitations of the physical
connection, which in our case is a 100Mbit/sec Ethernet on each physical port. For
example, if a TP has packet size equal to 100 bytes and data rate equal to 800Kbit/sec
then each user that utilizes the specific TP will generate 1 packet every second. This will
be the TP timer duration value. Whenever the timer expires, every user that utilizes the
TP should generate a packet 100 bytes long. As the test tool incorporates 4 physical
ports, an aggregate of n*100Mbit/s bandwidth is provided, where n is the number of
physical ports being connected to the network. Therefore the maximum bandwidth
provided is 400MBit/sec.

A UP is configured in terms of Point-to-Point Protocol, Layer 2 Tunneling Protocol, User


Datagram Protocol, user Internet Protocol and Ethernet frame parameters depending on
the actual Gi interface protocol stack. The test tool includes two variations

• L2TP signaling,

• no L2TP signaling.

The “L2TP signaling” mode assumes the full Gi protocol stack described in chapter 2
and therefore UPs configuration parameters define settings for all of the aforementioned
protocols. According to the L2TP signaling traffic scenario, mobile subscriber packets
are encapsulated in PPP frames, which in turn are enclosed in L2TP packets to traverse
IP-based PDN backbones. When no L2TP signaling is used, user packets are directly
encapsulated inside Ethernet frames; consequently, the “transport” IP, UDP, L2TP and
PPP layers are not part of the Gi protocol stack. The UPs configuration parameters
define settings for the Ethernet and user IP layers. Due to the fact that the “L2TP
signaling” is a superset of the “no L2TP signaling” mode, we will only refer to the
configuration process of the former.

- 68 -
Dimitris Tsaimos Msc Thesis Royal Institute of Technology

PPP parameters include setting the Maximum Receive Unit14, QoS parameter
negotiation, and selecting the authentication procedure - if the operator wishes to use
one - to be incorporated during PPP session establishment. L2TP parameters define the
number of L2TP tunnels to be formed between the test tool and the GGSN, as well as
the population of L2TP sessions to be carried within a tunnel. UDP, IP and Ethernet
MAC configuration defines port numbers, IP and Ethernet frame addresses to be
included in user packet headers correspondingly. Additionally, the operator can define
the values of certain IP packet header fields.

Furthermore, the UP regulates user activation. The operator can decide to activate users
in steps, in which case successive users will be entering the simulation with a time
interval calculated as a random value between the minimum and maximum values given
by the operator. The alternative is to have users entering simulation at a specified rate,
i.e. 15 users per second. User activation can be delayed by a random time interval
calculated by the software, based on minimum and maximum delay values supplied by
the operator. Last but not least, UP configuration enables the operator to associate a UP
to a specific TP and to decide the kind of traffic the users utilizing the UP will generate.
Specifically, a UP can be configured to generate plain signaling and/or data traffic.

There also are configuration options for the transport IP and Ethernet layer used to carry
packets over IP-based PDNs. Specifically, the IP and MAC addresses of the test tool
and the GGSN can be provided as input to the application software. Once the
configuration process is completed, the operator sets up a specific test scenario. The
scenario consists of UPs, the number of users incorporating a particular UP, as well as
the time interval of the simulation for which each UP will be active. TPs and UPs
information is translated by the software into appropriate user packet headers and well-
defined data structures facilitating hardware operation. Finally, packet headers along
with data structures are downloaded to master board memory.

6.3 Information Organization and Memory Addressing


Traffic Profiles and User Profiles are used to create two essential data structures, with
the first one being utilized by the Generator timer management module and the other by
the user profile management module. The timer manager data structures contain timer
duration and packet length values for each user. The user profile manager data
structures are the transmit Traffic Events. Transmit TEs describe the type of the
tunneling and underlying protocol that should be used for a particular user as well as the
TP utilized by the user. Data structures are prepared by the test tool based on the
configuration provided by the operator. They are then downloaded to board memory -
either to internal Xilinx block SelectRAM or to external dual Port RAMs - under the
control of the microprocessor and the board PCI bridge. Moreover, test tool software
constructs packet headers for all users, taking in consideration the UP they utilize, and
downloads them to board RAMs.

The board is equipped with dual-port RAMs as well as memory resources provided by
the Xilinx FPGA. Timer management and transmit TEs data structures, along with
14
The Maximum Receive Unit parameter is used to inform the peer the maximum packet size the PPP
implementation can support.

- 69 -
Institute for Microelectronics and
UMTS Gi Iinterfacing and Measuring System
Information Technology

statistics counters are stored inside the Xilinx RAM, as they are the most frequently
accessed objects. Packet headers on the other hand are stored in external RAM.
Memory organization facilitates base addressing mode. All packet headers for a
particular protocol are stored in continuous memory locations below a base address. A
protocol-specific packet header associated to a user can be retrieved by adding an offset
to the protocol-specific base address. The offset is the user ID, which uniquely identifies
an active user during simulation. For example, if all UDP packet headers have a length
equal to mw memory words and they are stored in contiguous memory locations below
base address ba, then the first word of the UDP packet header for user having userID
equal to usrID will be located in address addr

addr = ba + mw x usrID

Timer management data structures, transmit TEs and statistics counters are organized
in a similar fashion; meanly in contiguous memory locations below a base address. The
user ID we referred to earlier is the relative position of the user transmit TE to the
transmit TEs base address. Additionally, there is a TP ID formed in an analogous
manner - using the relative position of the timer management data structure to the
related data structures base address. Base addressing memory reference was preferred
over other techniques due to the fact that it minimizes packet headers and data
structures lookup time. Furthermore, the overall memory organization presented
eliminates the need to have explicit user and TP IDs stored in board memory.

6.4 Configurable Architectures


The Generator and Analyzer architectures are configurable in terms of parallel engines
within modules, Xilinx RAM blocks utilization, memory depths and address bus widths.
The parameterized components approach enforces reusability policies and facilitates the
development of component libraries.

The reusability concept is of particular importance to companies, due to the fact that it
reduces design and implementation cycles. Furthermore, the individual modules are only
tested during their implementation and can be used as proven-functionality components
in new architectures. Last but not least, parameterized components allow for the same
design to be implemented in FPGAs having different resources at a minimum effort.

The Generator and Analyzer designs will be initially presented in a generic manner,
incorporating configurable hardware resources. After describing the parameterized
components, the instance of their corresponding architectures shall be presented based
on design constraints and test tool throughput requirements.

6.5 Generator Hardware Architecture


The basic step in designing hardware architectures is identifying the tasks that should be
performed and organizing their execution into modules with well-defined interfaces
between them. The data structures the test tool utilizes constitute a starting point for the
identification of the Generator tasks. First of all, all timers present in the system need to
be managed; recall that packet generation is triggered by timers expired. Each timer
expired must be located, and the TP IDs incorporating the timer expired should be

- 70 -
Dimitris Tsaimos Msc Thesis Royal Institute of Technology

retrieved. Furthermore, the Generator must examine all user profile data structures in
order to identify the ones utilizing the particular TPs and therefore the user IDs for which
packets should be assembled and scheduled for transmission. Each data structure is a
transmit TE, describing the protocol headers the resulting packet should comprise. The
transmit TE needs to be decoded and the packet must be assembled using the
predefined user packet headers residing in board RAMs. Finally, the packet must be
handed over to the Ethernet MAC chip for transmission.

The aforementioned tasks point out the basic hardware modules that should compose
the Generator. A timer manager, a TP searching engine, a UP searching engine and a
packet assembly and transfer engine should be designed to cover Generator
functionality. Due to the memory organization described earlier, the timer manager and
TP searching engine can be collocated in the same module, since a TP ID is inherently
related to the timer data structure memory address.

Nevertheless, there are additional issues to consider. On the one hand, we should recall
that timer duration values are expressed in ticks, whereas the only clock present on the
test tool board is the system digital clock. Consequently, there should be a module
keeping track of simulation time. On the other hand, the process of assembling and
transferring a packet to the Ethernet MAC chip along with the actual packet transmission
could be significantly slower than finding the transmit TE that caused the datagram
generation. The simplest scenario is that a transmit TE is triggered, and that during
packet assembly a new transmit TE that needs to be served is found. If there is no
buffering of information between the UP searching engine and the packet assembly and
transfer engine, the second transmit TE will be ignored. An analogous situation may be
the result of the MAC controller FIFOs being full, in which case packets can not be
transferred to the IXF440 chip. Therefore a module that stores and manages transmit
TEs is required, so that as few transmit TEs as possible are discarded.

6.5.1 Building Blocks


Having identified the main Generator tasks, as well as secondary tasks assisting the
primary ones, we will provide an in-depth description of the actual modules
implemented, emphasizing on design decisions lying behind them. The building blocks
of the Generator are

• ticks counter,

• traffic profile engine,

• user profile engine,

• transmit scheduler,

• transmit data pump.

The Ticks Counter is responsible for keeping track of simulation time and informing the
Traffic Profile Engine that time equal to a tick has elapsed. The Traffic Profile Engine
(TPE) locates TPs whose timers have expired and supplies the User Profile Engine with

- 71 -
Institute for Microelectronics and
UMTS Gi Iinterfacing and Measuring System
Information Technology

the TP ID and packet length. The TPE incorporates the Timer Management Engine
(TME), which handles TPs timer durations. The TME must decrement as many timers as
possible within a tick, identify timers expired and initiate the packet generation
procedure. Based on information provided by the Traffic Profile Engine, the User Profile
Engine examines its corresponding data structures, finds the ones incorporating the
Traffic Profiles whose timer has expired and triggers transmit TEs. Transmit TEs are
buffered in a First-In First-Out queue, managed by the Transmit Scheduler, where each
entry contains all information needed to assemble a user packet.

Generator

off-chip RAM
ticks tick resources
counter duration

TxSched

F
TP IDs, transmit I transmit data IXF440
TPE packet UPE TEs F TEs
TxDP
lengths
O
control controller

on-chip RAM resources

microprocessor i/f

Figure 6-3 Generator block diagram

Finally, the Transmit Data Pump assembles user packets based on Transmit Scheduler
FIFO entries. It collects appropriate packet headers from header RAMs and
complements them with dummy payload in order to expand the packet size to the value
defined in the Traffic Profile. Moreover, it delivers packets assembled to the MAC
controller - the Intel IXF440 chip.

In the hardware architecture description that follows, we will primarily present a


generalized hardware architecture for each module, configurable in Xilinx resources
utilization, then perform system tuning in terms of performance and FPGA area
occupation and finally based on the test tool requirements decide the instance of the
architecture.

6.5.2 Ticks Counter


System time is measured in ticks, the reference unit by which time advances during
simulation. The actual value of a tick is a multiplicand of the digital clock utilized by the
system.

The Ticks Counter is the module responsible for managing the simulation clock and
signaling the completion of simulation clock cycles. It consists of a programmable
counter that generates a pulse on every tick. The number of clock cycles per tick is
configurable and maintained in a register within the Generator. The Ticks Counter
operation is straightforward; it preloads the value contained in the aforementioned
register and decrements it by one in every digital clock cycle. When the counter reaches

- 72 -
Dimitris Tsaimos Msc Thesis Royal Institute of Technology

a value of zero it generates a pulse, indicating the completion of a simulation cycle and
reloads the counter.

6.5.3 Traffic Profile Engine


The Traffic Profile Engine functionality can be divided into two distinct parts

• timer management,

• communication of Traffic Profiles information to the User Profile Engine.

TPE design was based on two modules; a timer manager and a module finding and
supplying information that assist the User Profile Engine in locating users for which
packets must be generated. In chapter 4, we presented various architectures for
realizing a timer manager, concluding with the reasons that lead to the selection of the
distributed timer manager approach.

The operation of the timer manager is straightforward. During one simulation cycle - a
tick - the timer manager must examine and assign a new value to decrementing
counters, as well as serve timers expired. As a result of that, it fetches timer duration
values from memory, decrements them by one and stores them back to memory.
According to the base addressing mode presented earlier, the position a particular timer
duration is stored in memory indicates the TP ID that uses it. In case a timer has
expired, all timer management engines suspend execution; the UPE interfacing module
retrieves the ones that have yielded a match, communicates the TP packet length and
TP ID to the UPE and increments a counter indicating the total number of current
matches. If the number of matches is below a predefined threshold - which depends on
the test tool throughput requirements - the module instructs timer management engines
to resume execution, otherwise they are halted. When the new system tick begins the
engines will continue decrementing counters from the point they were halted.
Additionally, timer managers reload timers expired with their preset value. All TP-related
data structures along with timer preset values are stored in internal Xilinx RAM memory.

We can further increase timer manager performance by “stretching” the design, thus
exploiting parallelism. Specifically, we can use tme timer management engines, each
one responsible for t timers

t = (max number of TPs) / tme.

Each timer group is stored in a different block SelectRAM so that the timer management
engines can have independent, exclusive access to the timer set they are in charge of.
We need a total of mw 32-bit memory words to store all timer management data
structures

mw = (max number of TPs) x sizeof(TP data structure)

where sizeof(TP data structure) is in 32-bit memory words. Having

max number of TPs = 4096 and sizeof(TP data structure) = 2,

- 73 -
Institute for Microelectronics and
UMTS Gi Iinterfacing and Measuring System
Information Technology

we get a total of 8192 32-bit words. Each block SelectRAM can be configured as a RAM
having 512 words, 36 bits wide each. Therefore, we can have tme timer management
engines

tme = 8192/512 ⇒ tme = 16,

each one with a private block SelectRAM holding the corresponding TP data structures.
More timer management engines can be deployed, increasing Xilinx RAM utilization,
since each extra engine requires one RAM block. In case fewer engines are deployed,
there will be no gain as to RAM utilization; TP data structures will still occupy the same
amount of space. However, there will be less FPGA area occupied by the timer
management logic. Additionally, max number of TPs words is needed to store timer
preset values. If we assume tme timer management engines, each engine will control t
timers

t = 4096 / tme.

In general, the basic requirement is that the tme timer management engines will be able
to search all timers they are responsible for within a tick, enabling the TPE to serve at
least one timer expired per timer manager in the worst case.

6.5.4 User Profile Engine


The User Profile Engine manages user transmit TEs. Transmit TEs are data structures
that describe user traffic characteristics, most importantly the TP a user utilizes and the
protocols constituting user traffic. They are the hardware representation of user profiles
configured by the operator. Transmit TEs are triggered by the TP timer expiration. The
task of the UPE is to examine all transmit TEs within a tick and find the ones that utilize
the TP whose timer has expired. Once it finds a match, it supplies the Transmit
Scheduler with a FIFO entry. Each FIFO entry consists of a user ID, the transmit TE and
the TP packet length.

Transmit TEs are placed in Xilinx RAM in sequential addresses, starting from a base
offset address. The actual position of each transmit TE indicates the userID, for which
packet headers will be fetched from external RAM. For example address 0x0000
contains the transmit TE of user 0, 0x0001 the transmit TE of user 1, etc.

The UPE sequentially accesses memory locations containing transmit TEs and
compares the TP ID they incorporate against the ones supplied by the TPE. Once it
finds a match it provides the module interfacing with the Transmit Scheduler data
necessary to fill in a FIFO entry.

To store the information contained in transmit TEs the UPE requires

(max number of users) x sizeof(transmit TE),

where sizeof(transmit TE) is in 32-bit memory words. Having

max number of users = 4096 and sizeof(transmit TE) = 1,

- 74 -
Dimitris Tsaimos Msc Thesis Royal Institute of Technology

yields a total of 4096 words. Each block SelectRAM can be configured as a RAM having
512 words, 36 bits wide each. Therefore, xrb Xilinx RAM blocks are needed

xrb = 4096/512 ⇒ xrb = 8.

The transmit TEs searching time is linearly increased as the number of transmit TEs a
UPE is responsible for increases. We can “narrow” the time a UPE needs to search the
entire transmit TEs list by shortening the number of transmit TEs present in each list,
meanly deploy more UPEs. For each extra UPE, an additional Xilinx RAM block is
needed; each RAM will contain tte entries

tte = (max number of users) / xrb,

where xrb is the total number of Xilinx RAM blocks. Deploying more UPEs that will have
to share a memory block with existent ones will not result in improvements in UPE total
performance. On the contrary, it will induce delay as the bottleneck will be the access
time to the shared memory segment.

Last but not least, we should consider the fact that the Transmit Scheduler FIFO is a
common resource for all UPE engines and therefore a FIFO access synchronization
mechanism should be included in the hardware design. The synchronization mechanism
selected is simple and straightforward. Once a UPE yields a match it informs a Transmit
Scheduler interfacing module. The module suspends all UPEs execution, compares their
data to the TP IDs to find the actual transmit TEs that yielded the match and retrieves
user IDs utilizing the particular TP. As soon as it supplies all matching entries to the
Transmit Scheduler, it instructs the UPE engines to resume execution.

6.5.5 Transmit Scheduler


The Transmit Scheduler (TxSched) acts as a buffer inside the Generator, storing and
managing transmit TE requests arriving from the UPE. The requests are maintained in a
First-In First-Out queue until the Data Pump can service them. The Data Pump fetches
FIFO queue entries, one at a time, and assembles packets based on the information
each entry contains. The relationship between FIFO entries and packets assembled is
one-to-one; for each FIFO entry exactly one packet is assembled and transmitted.

Transmit TEs can be served according to priority schemes or special scheduling


algorithms. As we have already mentioned, our implementation assumes a non-
prioritized execution environment; requests are served on a First Come First Served
basis. In essence, the TxSched is a First-In First-Out (FIFO) queue manager. Each
queue entry contains the user ID for whom a packet will be generated, the total length of
the packet and the actual transmit TE data structure. This is all the information the Data
Pump needs to assemble the packet. FIFO queue management tasks include

• regulating access to the FIFO between the Data Pump and the UPE,

• flow control between the FIFO and the user profile manager,

- 75 -
Institute for Microelectronics and
UMTS Gi Iinterfacing and Measuring System
Information Technology

• flow control between the FIFO and the Data Pump.

The FIFO is implemented in block RAMs configured in dual-port mode, ensuring that
each hardware module has independent access to the FIFO using a different port.
Furthermore, each FIFO transaction requires a handshaking procedure between the
module requesting access and the TxSched. Flow control is achieved by means of a
high watermark level and an empty flag. On the one hand, the high watermark level flag
informs the user profile manager that the FIFO is about to overflow and it should
therefore stop feeding FIFO entries to the TxSched. On the other hand, the empty flag
synchronizes transactions with the Data Pump, preventing it from fetching invalid FIFO
entries and putting it in “sleep mode” until new entries arrive.

The TxSched FIFO occupies

((number of FIFO entries ) x sizeof(FIFO entry)) / 512

Xilinx RAM blocks, when the blocks are configured as 512 x 36 RAMs - sizeof(FIFO
entry) is in 32-bit memory words. Each FIFO entry requires 3 memory words to be
stored, one for each element comprising it. To further increase parallelism, we have
each FIFO entry element type stored in a separate block SelectRAM. In this way, an
entire entry can be fetched within a single clock cycle. The exact FIFO queue size
should be calculated according to the throughput of the test tool in terms of traffic
transmission - bits per second.

Additionally, TxSched behavior should be deterministic. FIFO overflows should not affect
Generator functionality. If the FIFO becomes full, all arriving transmit TE requests are
discarded until room in the FIFO becomes available. This is a scenario fitting the needs
of our design; a prioritized scheme could incorporate queue entry replacement
algorithms to give precedence to high priority user traffic.

6.5.6 Transmit Data Pump


The Transmit Data Pump (TxDP) is the module that forms the actual packet to be
transmitted and delivers it to the chip implementing Ethernet MAC layer functionality. Its
operation is based on FIFO entries residing in the Transmit Scheduler. The TxDP is
responsible for

• fetching packet headers from memory,

• attaching “dummy” payload to packet headers,

• delivering the packet to the IXF440 chip.

Its functionality is quite similar to the one offered by a Direct Memory Access (DMA)
engine. DMA engines are specialized hardware that can handle large data transfers from
one memory region to another, without the need to be supervised by a central
processing unit or any other kind of supervisory circuitry. In memory-mapped systems,
peripherals, data registers and control registers are accessed as if they were memory

- 76 -
Dimitris Tsaimos Msc Thesis Royal Institute of Technology

locations, thus expanding the use of DMA engines beyond physical memories. On the
other hand, DMA engines do not provide flexibility and additionally require initialization of
every data transfer they perform. For these reasons, they are utilized in the case where

• relatively large amount of data needs to be transferred

• the data to be transferred resides in contiguous memory locations

• the frequency of data transfers is high

The above remarks have lead to the “mapping” of DMA functionality to the TxDP. The
TxDP must fetch a packet header for every transmit TE that has been triggered.
Protocol-specific packet headers reside in external RAM; their length varies from 4 to 20
bytes. Memory words that form a packet header specific to a protocol and a user are
stored in contiguous memory locations. Packet transfer source and destination
addresses are external RAM and one of the eight IXF440 FIFOs correspondingly.

For a DMA transfer to take place, the DMA engine must be initialized. Specifically, it
needs to know the memory address from which it will start fetching data, the amount of
data to be transferred and the address to which it will start storing data. Similarly,
Transmit Data Pump packet transfers have to be initialized. This task is delegated to the
Transmit Scheduler, which indirectly provides the TxDP with the memory address the
packet header for the user is located and to which IXF440 port it should be delivered.
The size of the data to be transferred is specific to the protocol. Additionally, there is
packet length information in the user TP, which indicates the amount of dummy data that
must be appended to the packet.

The TxDP does not handle data storage itself; it communicates each packet word to the
IXF440 MAC controller directly by means of the IXBus FIFO interface, thus avoiding
internal packet buffering. This hardware design decision was facilitated by the
incorporation of the Transmit Scheduler and the FIFO it deploys. Having presented both
modules, it is obvious that storing and managing transmit events is much more efficient
and simplistic than storing entire packets. Whenever packet transfer to IXF440 FIFOs is
feasible, the TxDP assembles the packet inside the IXF440 target FIFO, storing each
packet word directly in the FIFO.

The TxDP architecture is not resource consuming in terms of Xilinx RAM blocks. It does
not have any internal data storage, since it only implements the logic to interface to the
Transmit Scheduler and access the corresponding FIFO, the external RAMs holding
packet headers and the IXF440 chip.

The packet transfer from the TxDP to an IXF440 FIFO is performed using a handshake
procedure. The TxDP initially requests the IXF440 FIFO status to which it wishes to
transfer data. Each transmit FIFO has a transmit-ready signal indicating that there is
enough free space to load new data. The particular signals are asserted according to
predefined high-watermark levels the controller is programmed with; specifically, they
are only asserted if there are more bytes available in the FIFO than the high-watermark
level. In order to assert the transmit-ready indication of a particular FIFO the TxDP must
primarily use the signal indicating it wishes to find out the FIFO status and then specify a
FIFO putting its address on the port select bus. If the FIFO the packet must be

- 77 -
Institute for Microelectronics and
UMTS Gi Iinterfacing and Measuring System
Information Technology

transferred to can accept data, the TxDP starts the packet transfer asserting the FIFO
write enable signal. On the other hand, if the FIFO has exceeded the high-watermark
level, the transfer is stalled until room becomes available.

After the aforementioned procedure is complete, the Data Pump can start placing packet
words onto the IXBus. As soon as it places the first word, it asserts the start of packet
signal to indicate to the IXF440 the beginning of a new packet transfer; the signal
remains asserted until the whole packet is stored to the FIFOs. Finally, upon placing the
last word, it asserts the end of packet signal to inform the controller that the whole
packet has been delivered. During each packet word transfer, the Data Pump also
needs to set the appropriate value to the IXBus mask signals, which indicate to the
IXF440 chip the valid bytes of each 32-bit word carried over the bus. In case FIFO
entries exceed the high-watermark level while a packet transfer operation is in progress,
the controller deasserts the corresponding transmit ready signal notifying the Data Pump
of the event. The Data Pump then stalls the transfer, re-initiates the handshake
procedure and waits for the transmit ready signal to be reasserted so as to resume its
operation.

Recall from chapter 4 that the IXBus is configured in split mode. The split mode
configuration achieves the highest throughput for our design, as it ensures independent
data paths for the Generator and the Analyzer. Each module has exclusive access to its
own data bus, avoiding the use of the FIFO interface as a shared resource, which would
further require bus access synchronization mechanisms - i.e. bus arbitration.

6.6 Analyzer Hardware Architecture


The Analyzer is responsible for fetching packets received from the MAC controller,
parsing them and collecting statistics based on their header information. Analyzer
functionality is based on a combination of decapsulation and multiplexing. Each layer
packet encapsulates higher layer datagrams. However, different protocols are suggested
for implementing a particular layer functions and therefore a protocol must not be tied to
a specific higher layer protocol, but include the capability to carry all types of higher layer
packets instead. Multiplexing is the mechanism used to distinguish the higher layer
protocol to which the decapsulated packet should be delivered for further processing. It
is implemented as a special packet header field, whose value points out the actual
higher layer protocol utilized.

The functional decomposition of the Analyzer indicates three major tasks that should be
performed. The Analyzer should fetch packets received by the MAC controller and
based on the protocol stack of the Gi interface, examine specific header fields. Finally,
depending on the field values the corresponding statistics should be updated. Recall
from chapter 4 that a status word follows every data transfer from the IXF440 FIFOs to
the Analyzer. The packet status word should also be parsed by the Analyzer, since it
contains information facilitating statistics collection.

The basic hardware modules the Analyzer should comprise are a packet transfer engine
interfacing with the IXF440 chip, a packet parsing and a statistics gathering engine.

- 78 -
Dimitris Tsaimos Msc Thesis Royal Institute of Technology

6.6.1 Building Blocks


Analyzer functionality can be decomposed into modules, each implementing a particular
tasks subset. What follows is an in-depth description of the actual modules designed,
emphasizing on underlying design decisions. The building blocks of the Analyzer are

• receive data pump,

• statistics fields parser,

• statistics fields manager,

• statistics engine.

The Receive Data Pump (RxDP) interfaces with the MAC controller. Its duties include
fetching packets from the IXF440 FIFOs and delivering them to the Statistics Field
Parser (SFP) along with the status word following each packet transfer. The SFP
receives packet header and status words, locates and isolates the specific fields used
for statistics gathering. Each field is then stored separately in the Statistics Fields
Manager (SFM). The SFM incorporates a FIFO queue to which the header and the
status word fields are stored until the Statistics Engine retrieves them. The Statistics
Engine maintains statistics on a protocol level, utilizing the actual values of status word
and packet header fields. Statistics are in essence a set of counters, incremented
according to occurrences of specific events. Finally, let us note that statistics
organization obeys to the base addressing scheme described earlier in this chapter,
where contiguous memory locations contain statistics related to the same protocol.

Analyzer

SFM

F
header
statistics I statistics words data IXF440
SE F SFP RxDP
fields fields control control controller
O

statistics
counters

on-chip RAM resources statistics counters

microprocessor i/f

Figure 6-5 Analyzer block diagram

Statistics gathered is visualized using the application software running on the host
processor. The test tool microprocessor generates an interrupt request on a one second
basis, and the interrupt handler of the host initiates a transfer from the board statistics
memory to the application software memory. Once the transfer is completed, the
application software reads the updated information and displays it to the test tool
operator.

- 79 -
Institute for Microelectronics and
UMTS Gi Iinterfacing and Measuring System
Information Technology

6.6.2 Receive Data Pump


The Receive Data Pump is the module that retrieves packets from the IXF440 FIFOs
and delivers them to the Statistics Fields Parser. It is the counterpart of the Generator
Transmit Data Pump. In essence, the Receive Data Pump is an approximation of a DMA
engine, where the transfer source is one of the controller FIFOs and the target is a 32-bit
wide register within the Statistics Field Parser. When compared to the Transmit Data
Pump, the transfer initialization process is more simplistic. The Analyzer need not fetch
packets from a specific port; therefore the transfer start address is any IXF440 queue
having data above the predefined threshold. The actual FIFO is decided using a round
robin algorithm. The status of each FIFO is requested and the first FIFO holding
sufficient data for a transfer is selected. The destination address is a location within the
SFP.

Due to the fact that statistics are collected using packet header information, only packet
header fields are stored in the SFM FIFO. However, the whole packet needs to be
fetched from the controller FIFO to free the space it occupies. Furthermore, the
controller places the status word on the IXBus after the packet is completely removed
from the FIFO. The IXF440 start and end of packet signals indicate the beginning and
end of each transfer. Once the end of packet signal is asserted the RxDP knows that the
next 32-bit word received from the IXBus will be the packet status word.

The RxDP fetches packet headers from the controller FIFO in word bursts; the actual
burst size is configurable. Each word received is handed over to the SFP which
recognizes, isolates and stores the fields used for statistics gathering. Once the SFP has
finished processing all words retrieved in the burst, the RxDP retrieves the next ones.
The side-effect of the aforementioned scheme is that there needs to be a flow control
mechanism between the IXF440 device, the RxDP and the SFP. The SFP should not be
fed with new header words until it finishes processing the current ones and the RxDP
should not remain idle if the SFP has parsed and stored the current word fields.
Additionally, the transfers between the IXF440 controller and the RxDP should be
synchronized.

There are two flow control mechanisms needed; one to regulate transfers between the
IXF440 and the RxDP and one to synchronize the RxDP with the SFP. The former one
utilizes a IXF440 input signal, which should be asserted to enable controller receive
FIFO accesses. The RxDP deasserts the particular signal every time a burst transfer is
completed and reasserts it as soon as the SFP has parsed and stored the statistics-
related fields the words comprise. The latter mechanism is implemented as a handshake
procedure between the RxDP and the SFP.

The RxDP architecture is not resource consuming in terms of Xilinx RAM blocks, since it
just implements the interfacing logic to the IXF440 device and the SFP.

The packet transfer procedure from a IXF440 FIFO to the RxDP is analogous to the one
followed by the Transmit Data Pump to move a packet to a controller FIFO. The RxDP
initially polls the IXF440 FIFOs to decide the one containing sufficient data for a transfer
to be performed. Each receive FIFO has a receive-ready signal indicating that there is
enough data for a transfer to be initiated onto the IXBus. Similarly to transmit-ready,
receive-ready signals are asserted according to predefined low-watermark levels the

- 80 -
Dimitris Tsaimos Msc Thesis Royal Institute of Technology

controller is programmed with; specifically, they are only asserted if there are more bytes
stored in the FIFO than the low-watermark level. The polling process is implemented
using a signal which indicates that the RxDP wishes to find out the receive status of a
FIFO, and then examining all FIFOs placing their address on the FIFO port select bus
until a particular one asserts its receive-ready signal. Once the RxDP finds a FIFO
holding sufficient data, it initiates the packet transfer asserting the FIFO read enable
signal.

Upon completion of the aforementioned procedure, the RxDP begins fetching packets
from the controller. The controller asserts the start of packet signal to indicate to the
IXF440 the beginning of a new packet transfer; the signal remains asserted until the
whole packet is stored to the FIFOs. Finally, the IXF440 asserts the end of packet signal
when it places the last packet byte on the IX Bus, informing the RxDP that the next word
it will receive will be the status word. The RxDP reads in the status word and hands it
over to the SFP. Both the start and end of packet indications are communicated to the
SFP to facilitate packet parsing and status word recognition.

In case FIFO bytes drop below the low-watermark level while a packet transfer operation
is in progress, the controller deasserts the corresponding receive-ready signal notifying
the RxDP of the event. The RxDP then stalls the transfer, re-initiates the handshake
procedure and waits for the receive-ready signal to be reasserted so as to resume its
operation. Last but not least, the RxDP also needs to examine IXBus mask signals
during each word transfer, to decide the valid bytes of each 32-bit word carried over the
bus.

Finally, we should note that IXF440 FIFOs are served using a Round Robin algorithm.
The RxDP polls all FIFOs sequentially and serves the first one holding data above the
low-watermark level. As soon as the current FIFO is served, the RxDP restarts the
polling procedure from the next one. The particular scheme fits the needs of the design
since the overall architecture does not impose any priority limitations. In case
prioritization needs to be applied - i.e. if each IXF440 FIFO holds packets having
different QoS requirements - all FIFOs status could be examined before deciding the
actual one to serve, whereas the final selection could be based on a different algorithm
giving precedence to certain FIFOs over others.

6.6.3 Statistics Fields Manager


The SFM stores and manages packet headers and the corresponding packet status
fields parsed by the SFP. It acts as a buffer inside the Analyzer, maintaining information
in a First-In First-Out queue until the Statistics Engine can retrieve it and update the
corresponding statistics counters. As mentioned earlier, each FIFO entry includes a
programmable number of 32-bit words, which include all packet header and status word
fields examined for statistics gathering. The SFM FIFO is a common resource for the
Statistics Fields and the Statistics Engine. Therefore, the SFM is delegated queue
management tasks, the most important of which are

• regulating access to the FIFO between the SFM and the Statistics Engine,

• controlling the data flow between its FIFO and the SFP,

- 81 -
Institute for Microelectronics and
UMTS Gi Iinterfacing and Measuring System
Information Technology

• controlling the data flow between its FIFO and the Statistics Engine.

Generally, the FIFO management tasks are similar to the ones performed by the
Transmit Scheduler. The FIFO is implemented in dual-ported block SelectRAMs, having
each hardware module access the FIFO through a different port. FIFO transactions
require a handshaking procedure between the module requesting access and the SFM.
Flow control is based on threshold levels and accompanying status indication flags. A
high watermark level flag informs the RxDP that the FIFO can further accept a limited
number of bytes. The empty flag synchronizes transactions with the Statistics Engine,
preventing it from fetching invalid FIFO entries and putting it in “sleep mode” until new
entries arrive.

The SFM FIFO occupies

((number of FIFO entries ) x sizeof(FIFO entry)) / 512

Xilinx RAM blocks, when the blocks are configured as 512 x 36 RAMs - sizeof(FIFO
entry) is in 32-bit memory words. The size of the FIFO entry depends on the number of
packet header and status words fields used for statistics and therefore on the actual
protocol stack deployed. The Analyzer collects statistics using a total of 19 fields and/or
field sets when the full Gi protocol stack is deployed. Consequently a SFM FIFO entry
contains 19 32-bit memory words. To further increase parallelism, the header fields
retrieved by different word bursts can reside in different block SelectRAMs. In this way,
the Statistics Fields Parser functionality can be distributed to autonomous modules, each
one responsible for parsing fields located inside a particular burst. The number of Xilinx
RAM blocks in that case would be equal to the number of SFP engines, to ensure that
each engine would have exclusive access to a private RAM block. The exact FIFO
queue size and organization should be calculated according to the throughput of the test
tool in terms of traffic reception - bits per second.

6.6.4 Statistics Fields Parser


The Statistics Fields Parser receives packet header and status words and prepares
them for the Statistics Engine. SFP functionality is built on the concept of decapsulation.
Decapsulation is the process of gradually removing packet headers up to the point
where the packet contains plain data. In order to remove all packet headers leaving data
intact, an entity must be aware of the protocol stack, meanly the total amount of headers
and their corresponding structure a packet comprises. The protocol stack deployed is
indirectly included in the test tool configuration parameters, as the operator has the
option to incorporate PPP and L2TP or transport user packets directly over Ethernet.

The SFP receives packet header words in bursts, storing and processing current burst
words before fetching new ones. The burst size is a configurable parameter, and
depends on the protocol stack used, as well as the total number of SFP engines used. In
order to increase the SFP throughput, multiple parsing engines can be deployed, each
one responsible for isolating the header fields within the words of a particular burst.
Header fields are isolated using bitmasks stored in Xilinx RAM blocks. There are a total
of sfpe RAM blocks needed, where sfpe is the number of SFP engines. Each RAM will
hold the bitmasks used by the specific engine.

- 82 -
Dimitris Tsaimos Msc Thesis Royal Institute of Technology

Statistics are maintained in the form of counters, updated according to the occurrence of
specific events. The actual events are related to data contained inside packet header
fields. Therefore, the SFP must locate and isolate all fields used for gathering statistics
inside a particular header word, align them on a 32-bit boundary and store them to the
SFM FIFO. The exact fields the SFP must isolate are inherently related to the protocol
stack utilized. Additionally, the SFP maintains bit masks used to isolate a field from
adjacent ones present in the same word.

Therefore, the SFP must be aware of the amount of data it should parse. Based on the
encapsulation concept described earlier, the first bytes of a packet payload will contain
the higher layer packet header. The SFP has to parse a number of packet headers
whose size is predefined. The actual size depends on the protocol stack utilized. For
example, if the protocol stack presented in chapter 2 is deployed, each packet will have

• Ethernet header (14 bytes)

• IP header (20 bytes)

• UDP header (8 bytes)

• L2TP header (16 bytes)

• PPP header (1 or 2 bytes)

• UsrIP header (20 bytes)

Consequently there will be maximum 80 bytes containing header information in each


packet and 20 32-bit words will have to be parsed. To provide a level of flexibility, the
total number of 32-bit words to be parsed is a configurable parameter, depending on the
protocol stack used. Additionally, the status word must be recognized therefore the
transfer completion must be recognized. The IXF440 start and end of packet signals
indicate the beginning and end of each transfer. Once the end of packet signal is
asserted the SFP knows that the next 32-bit word received from the RxDP will be the
packet status word.

The SFP word counter is decremented by one after every word parsing operation, until it
reaches zero. When this happens, the SFP “knows” that the rest of the words is packet
data and ignores it.

Furthermore, the SFP incorporates a Header Checksum Engine (HCE), which


implements the logic necessary to calculate the IP headers checksum. The checksum
calculated is stored in the Statistics Fields Manager FIFO. The SE compares the
checksum computed by the HCE against the one included in the IP header and in case
there is a mismatch, increments the corresponding counter. The IP header checksum
verification is an example of the flexibility provided by the real-time parsing approach;
the SFP can be delegated computations requiring the entire packet header and/or
payload, without actually having to store all the information.

- 83 -
Institute for Microelectronics and
UMTS Gi Iinterfacing and Measuring System
Information Technology

There is an additional issue to consider. Not all packet headers have predefined fields
and lengths. L2TP includes optional header fields whereas PPP has a single, variable-
sized field. Consequently, the SFP is further required to examine particular bits
indicating the presence of optional L2TP header fields and identify the exact size of the
PPP header. The aforementioned situation imposes limitations on the RxDP as well, due
to the fact that the total number of bytes comprising all headers present in a packet is
not fixed. To avoid any undesirable side-effects, the RxDP always supplies the SFP with
the maximum number of bytes a header may comprise, possibly carrying part of the
payload. It is the responsibility of the SFP to decide the actual header bytes.
Nevertheless, the current implementation assumes a default IP header of 20 bytes; IP
headers have a large number of extensions, part or all of which can be present though in
practice very few are used.

6.6.5 Statistics Engine


As mentioned earlier, statistics is realized using counters, updated upon occurrence of
particular events, related to header and status word fields. The Statistics Engine (SE) is
the module managing the counters. The SFP prepares the fields and stores them to the
SFM FIFO. The 32-bit alignment of all fields ensures that each word fetched by the SE
contains exactly one field. However, whenever statistics decisions are based on groups
of fields, i.e. a set of bits, not single fields but the entire group is 32-bit aligned. The SE
simply fetches fields or field sets from the FIFO, examines their values and increments
the corresponding counters when necessary. The SE is configured with the total number
of fields it must examine for every packet. Once it checks all field values, and provided
the SFM FIFO is not empty, it restarts the counter update cycle.

Statistics counters are 32-bit wide and are stored in Xilinx RAM blocks, configured as
512 x 36 dual-port RAMs. One port is dedicated to the SE and the other to the
microcontroller. The RAM blocks used depend on the number of counters maintained.
Finally, we should note that responsibility for avoiding counter overflows is delegated to
the board microprocessor. The application software running on the microprocessor
instructs the SE to reset all counters in regular time intervals.

Last but not least, it should be mentioned that multiple SE engines can be deployed,
each one responsible for a counter subset. In this case, the total number of the SE Xilinx
RAM blocks needed will be equal to the number of the SE engines.

6.7 Statistics Fields and Counters


The header fields used to collect statistics depend on the protocol stack utilized. The test
tool provides the ability to use the full Gi protocol stack or a simplified one where mobile
subscriber IP packets are transported directly over Ethernet. The packet header fields
and statistics counters to be presented refer to the “L2TP signaling” mode. However, the
procedure is analogous for the “no L2TP signaling” mode.

- 84 -
Dimitris Tsaimos Msc Thesis Royal Institute of Technology

14 20 8 16 1/2 20

Ethernet IP UDP L2TP PPP user IP

Figure 6-7 Gi protocol stack incorporating L2TP and PPP

The full Gi protocol stack as it was presented in chapter 2 is depicted in Figure 6-7.
Providing a detailed description of all packet headers incorporated is outside the scope
of this document. Readers interested should refer to the related RFCs. The following
sections will focus on the particular fields used for statistics gathering, concluding with
an example demonstrating SFP and SE functionality.

6.7.1 Internet Protocol Statistics


The first header inside an Ethernet frame payload is related to IP [25]. Table 6-1
summarizes fields of interest to the SE contained in the IP header and the corresponding
statistics counters

Table 6-1 IP header fields and related counters

header field statistics counter


Internet Header Length invalid IP headers
Total Length total IP bytes received
Protocol invalid higher layer protocol
Header Checksum header checksum errors
Destination Address invalid destination address

The Internet Header Field specifies the length of the IP header in 32-bit words. For a
header to be valid, it must be at least equal to 5. Its value is examined and if it is less
than 5 the SE increments the corresponding counter. The Total length indicates the IP
packet data total length - not just the length of the particular packet. This is important in
cases where the original IP packet has been segmented and carried inside multiple
Ethernet frames. A bit inside the IP header named More Fragments (MF) flag indicates
whether this is the last fragment of an IP packet or more fragments follow. However, it is
common practice to avoid fragmentation for IP packets in the UMTS domain and the
Total Length represents the total number of IP payload bytes received.

The Protocol specifies the next encapsulated protocol, which should be UDP. Each
protocol is identified by a unique number; in the case of UDP this is 17. If the Protocol
field is not 17, the SE increases the invalid higher layer protocol counter. Finally, the
Header Checksum is a 16-bit one’s complement checksum of the IP header. The
checksum is reevaluated and compared against the Header Checksum. The header
checksum errors counter is incremented whenever the two values are not identical.
Finally, the header destination address refers to the IP address of the station the packet
must be delivered to. When L2TP is utilized the station is the other end of the tunnel,
represented by the test tool. The Destination Address is compared against the test tool

- 85 -
Institute for Microelectronics and
UMTS Gi Iinterfacing and Measuring System
Information Technology

IP address and if there is a mismatch between the two, the invalid destination address
counter is incremented.

There are two additional counters based on generic information and not header fields.
The valid IP packets counter is incremented every time a header successfully passes all
of the aforementioned checks; in case a packet header fails a check the invalid IP
packets counter is incremented instead. Obviously, the total number of IP packets
received by the Analyzer is the sum of the two previous counters.

6.7.2 User Datagram Protocol Statistics


An IP packet encapsulates a UDP [26] header. Table 6-2 contains UDP header fields of
interest to the SE and the related statistics counters.

Table 6-2 UDP header fields and related counters

header field statistics counter


Destination Port invalid port
invalid UDP headers
Length
total UDP bytes received

The Destination Port specifies the higher layer application running on the host to which
the packet will be delivered. It is used to uniquely identify applications running on the
same host. If UDP is used to carry L2TP data, the destination port field should be equal
to 1701. The invalid port counter is incremented for every header whose destination port
is not equal to 1701. The Length indicates the total length of the UDP header along with
the payload data it contains. Both the invalid UDP headers and the total UDP bytes
received counters are based on the particular field. The Length field is added to the total
UDP bytes received counter for every UDP header parsed. The invalid UDP headers
counter is incremented every time the Length value is less than eight, since this is the
minimum value defined.

Similarly to IP statistics, there is a valid UDP packets counter incremented every time a
header successfully passes all of the aforementioned checks; in case a packet header
fails a check the invalid UDP packets counter is incremented instead.

6.7.3 L2TP Statistics


L2TP headers are carried within UDP packet payloads. The L2TP header fields utilized
by the SE and the related statistics counters are summarized in Table 6-3

Table 6-3 L2TP header fields and related counters

header field statistics counter


L2TP control packets
Message Type
L2TP data packets
Length Present invalid L2TP control packet headers

- 86 -
Dimitris Tsaimos Msc Thesis Royal Institute of Technology

Sequence Present
Offset present
Priority
Version invalid version L2TP headers
total L2TP control bytes received
Length
total L2TP data bytes received

The Message Type is a bit indicating whether the particular packet is a data or control
packet. If it is equal to 0, the packet carries control information and the L2TP control
packets counter is updated, otherwise the L2TP data packets counter is incremented.
The Length, Sequence, Offset Present are one-bit wide fields indicating the presence of
optional header fields - the implementation assumes that all L2TP header fields are
present. The Priority also is a one bit-wide field set when data messages should receive
preferential treatment in queuing and transmission.

For L2TP control messages, the particular fields must have predefined values;
specifically the Length Present bit must be set, whereas the Sequence and Offset
present as well as the Priority bits must be cleared. If a L2TP packet has been identified
as a control message and the values of the aforementioned fields do not conform to the
values specified, the invalid L2TP control packet header counter is incremented. The
version field indicates the L2TP protocol version used and must be set to two. Any
packet containing a different version value causes the invalid version L2TP headers to
increase by one. Finally, the Length conveys the total L2TP packet length in bytes -
including the packet header. Depending on the packet type - data or control – the Length
is added either to the total L2TP control bytes received or the total L2TP data bytes
received counter.

6.7.4 PPP statistics


Each L2TP packet encapsulates a PPP frame. The PPP frame structure is very simple,
as it contains a header either one or two bytes long and payload data. The header
comprises a single field called Protocol, whose value indicates the higher layer packet
type the PPP packet encapsulates. Table 6-4 depicts the Protocol field values
meaningful within the context of the test tool application and the corresponding field
lengths and encapsulated protocols.

Table 6-4 PPP header Protocol field values

field value (hex) higher-layer protocol field length(bytes)


0x0021 IP 1
0x8021 IPCP 2
0xC021 LCP 2
0xC223 CHAP 2
0xC023 PAP 2

- 87 -
Institute for Microelectronics and
UMTS Gi Iinterfacing and Measuring System
Information Technology

Recall from chapter 2 that PPP contains both a control and a signaling part. IPCP and
LCP are used to configure PPP session parameters, whereas IP, CHAP and PAP
packets carry user information. Specifically, IP encapsulates user data; PAP and CHAP
are used during PPP session setup to authenticate peers. The SE examines the value of
the Protocol field and updates a set of PPP related counters Table 6-5. If the Protocol
indicates that an IP, CHAP or PAP packet is encapsulated in the PPP frame, the PPP
data packets counter is incremented. If it indicates an IPCP or LCP packet, the PPP
control packets counter is updated. Finally, in case the frame carries a protocol not
meaningful for the Gi stack, the SE modifies the PPP invalid packets counter
Table 6-5 PPP header fields and related counters

field value (hex) statistics counter higher-layer protocol


0x0021 IP
0xC223 PPP data packets CHAP
0xC023 PAP
0x8021 IPCP
PPP control packets
0xC021 LCP
Other PPP invalid packets -

6.7.5 User Internet Protocol Statistics


The statistics maintained for the packet containing the actual user information are similar
to the ones for the IP layer used to carry the packet across IP-based PDNs. The Internet
Header Length value is related to an invalid UsrIP headers counter, the Total Length is
added to total UsrIP bytes received counter and the checksum header field is
reevaluated for error detection. Differentiation comes in the Protocol and Destination
Address fields, due to the fact that on the one hand the UserIP packet can incorporate
any transport layer protocol and on the other hand the destination host of a particular
packet can have any IP address. Therefore, examining the Protocol and Destiantion
Address fields is not meaningful in the UsrIP layer.

Table 6-6 IP header fields and related counters

header field statistics counter


Internet Header Length invalid UsrIP headers
Total Length total UsrIP bytes received
Header Checksum header checksum errors

Finally, there is the valid UsrIP packets counter, incremented every time a header
successfully passes the Internet Header Length and checksum checks and an invalid
UsrIP packets counter incremented whenever the header fails one or all of the checks.

6.7.6 Ethernet statistics


MAC layer statistics are maintained using the status word supplied by the IXF440
controller upon completion of a packet transfer onto the IX Bus. The packet status is

- 88 -
Dimitris Tsaimos Msc Thesis Royal Institute of Technology

appended to any packet completely transferred onto the IX Bus in the access following
the last byte transfer. It is a 32-bit quantity, describing certain features of the Ethernet
frame. The fields it comprises and their meaning is summarized in Table 6-7.

Table 6-7 IXF440 status word

bit name bit number Description


LEN 31:16 packet length
- 15:11 RESERVED
MLT 10 multicast packet
BRD 9 broadcast packet
ROK 8 receive OK
FLW 7 flow-control packet
- 6 RESERVED
MER 5 MII error
RTL 4 too long packet
RNT 3 runt packet
DRB 2 alignment error
CRC 1 CRC error
OVF 0 receive FIFO overflow

The LEN bits contain the packet length, whereas the MLT, BRD, FLW and CRC are
flags indicating whether the Ethernet frame is a multicast, broadcast or flow control
packet as well as if the frame contained a CRC error. The SE includes one counter for
each field, updated according to the information contained in the corresponding status
word field. The rest of the fields are examined by the SE to decide if the packet received
contains any kind of errors and should therefore be considered invalid. In essence, the
IXF440 device maintains statistics on a packet basis, which it furthers hands over to the
Analyzer for aggregation. Additionally, the SE compares the Destination Address
contained in the Ethernet frame header against the MAC address assigned to the test
tool. In case there is a mismatch between the two addresses, it increments a MAC
invalid destination address counter. Table 6-8 contains all the packet status words and
Ethernet frame header fields used for MAC layer statistics gathering

Table 6-8 IXF440 status word fields and related statistics

status word bit statistics counter


LEN total MAC bytes received
MLT MAC multicast packets
BRD MAC broadcast packets
FLW MAC flow control packets
CRC MAC CRC errors
MER MAC reception error
RTL

- 89 -
Institute for Microelectronics and
UMTS Gi Iinterfacing and Measuring System
Information Technology

RNT
DRB
OVF
MAC header field statistics counter
Destination Address invalid MAC destination address

6.7.8 Statistics Gathering Example


In order to provide a hint regarding SFP and SE functionality, we will describe the IP
protocol statistics gathering procedure, along with the related configuration parameters.
The IP protocol header structure is depicted in Figure 6-6 having the specific fields used
for statistics highlighted.

00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
version IHL TOS total length
D M
identification R F F fragment offset
TTL protocol header checksum
source IP address
destination IP address

Figure 6-6 Internet Protocol header

The SFP needs to isolate the IHL, total length, MF, protocol, header checksum and
destination IP address fields. Knowing that the MAC header comprises 14 bytes, the 15th
byte received by the RxDP will be the first byte of the IP header. The 15th byte is
included in the 4th 32-bit word fetched from the IXF440 device (Figure 6-7). Therefore,
once the SFP word counter is equal to four, the SFP will do a bitwise AND operation
between the 32-bit word and the bitmask used to isolate the IHL field.

IP header MAC header

version IHL TOS MAC type/length

byte 16 byte 15 byte 14 byte 13

Figure 6-7 Packet header word instance

Each field isolated is expanded to a 32-bit word, containing the actual field value to the
lower bits. The bits remaining are all filled with zeroes. The 32-bit word is then stored to
the Statistics Fields Manager FIFO. Once all fields present inside header are stored to
the FIFO, the RxDP is signaled to initiate a new transaction with the IXF440 device.
Additionally, all IP header fields are fed to the Header Checksum Engine, which re-
computes the header checksum fields and stores it to the SFM FIFO, below all other
fields. The re-computed header checksum will be compared against the actual one
contained in the IP header by the SE.

- 90 -
Dimitris Tsaimos Msc Thesis Royal Institute of Technology

The 32-bit words received by the IXF440 that contain the IP header fields along with the
related bitmasks and SE counters are summarized in Table 6-9.

Table 6-9 IP statistics example

header field SFP bitmask (hex) IXF440 word SE counter


IHL 0x000000F0 4 invalid IP headers
total length 0xFFFF0000 5 total IP bytes received
protocol 0x000000FF 6 invalid higher layer protocol
header checksum 0xFFFF0000 7 header checksum errors
0x0000FFFF 8 invalid destination address
destination address
0xFFFF0000 9 invalid IP headers

- 91 -
Institute for Microelectronics and
UMTS Gi Iinterfacing and Measuring System
Information Technology

7. Architectural Optimization

7.1 Performance Requirements


What has been presented so far, is a configurable Gi interfacing and measurements
system architecture, in terms of parallel engines deployed and Xilinx RAM blocks
utilization. The final Gi system architecture should be based on the test tool traffic
throughput requirements and constraints.

7.1.1 Throughput Requirements


The test tool incorporates 4 100Mbit/s Ethernet ports, yielding a total throughput of
400Mbps. The Ethernet mode imposing the most demanding requirements in terms of
throughput is full-duplex operation; a MAC capability that allows simultaneous two-way
transmission over point-to-point links. The full-duplex operation results in an effective
doubling of the link bandwidth, enabling each link to support full-rate, simultaneous, two-
way transmissions. It is far more simplistic and effective than traditional half-duplex
transmission, because it involves no media contention, no collisions and no need to
reschedule transmissions. The only requirement is the need for a minimum length gap
between successive frame transmissions, known as the Inter-Frame Gap (IFG). The
minimum value for the IFG defined by the Ethernet standard is equal to the transmission
time of 96 bits. When the test tool is used in such Ethernet networks, both the Generator
and the Analyzer have to attend 400Mbps of egress and ingress traffic each. Practically
speaking, the Generator should be able to fill 400Mbps with “virtual” user packets and
the Analyzer should be able to parse headers and collect statistics from the packet
headers of network traffic arriving at the same rate.

7.1.2 Test Tool Constraints


Recall from chapter 4 that the basic time unit of the simulation is the tick, which is a
multiplicand of the actual digital clock period the system is using. The tick provides a
means for the test tool to concurrently maintain and manage a large number of timers,
according to the expiration of which Traffic Events are scheduled for transmission. The
Generator must examine as many TPs timers as possible within a tick, in order to
generate at least 400 Mbps of network traffic using the ones that have expired. Similarly,
the Analyzer must parse headers corresponding to 400Mbps traffic and update the
corresponding statistics counters within a tick.

The test tool has defined the minimum time interval required to schedule 4096 user
transmit TEs at 3us/6us - depending on the digital clock frequency which can be either
100MHz or 50MHz. The aforementioned restriction imposes the lowest value possible
assigned to the tick as 300 clock cycles. In essence, the tick defines the minimum time
interval within which the Generator is required to schedule transmit TEs sufficient to
create a minimum of 400Mbps traffic.

7.2 Throughput Analysis

- 92 -
Dimitris Tsaimos Msc Thesis Royal Institute of Technology

When combining the constraints and requirements mentioned previously, the result is
that the test tool must be able to both generate and process 400Mbps of network traffic
within 300 clock cycles. The worst-case scenario for the Generator is realized when the
tick has a value of 3us/6us and all packets sent to the network have their minimum
length, due to the fact that it will have to schedule more transmit TEs. Additionally, since
many users may deploy the same TP, a timer expiration could result in more than one
transmit TEs being triggered. Nevertheless, the approach described hereafter assumes
the that each of the 4096 users will incorporate a different TP.

The Ethernet frame must consist of at least 46 bytes of data; a total of 14 header bytes
complement the packet delivered to the IXF440 device and finally the seven preamble,
one start of frame delimiter and four CRC bytes are added. Therefore the minimum size
for an Ethernet frame fmin is

fmin = datamin + SA + DA + Type/Length + PRE + SFD + CRC ⇒


fmin = 46 + 6 + 6 + 2 + 7 + 1 + 4 ⇒
fmin = 72 bytes or fmin = 576bits

400 Mbps of network traffic corresponds to 400 bits per microsecond or 1200/2400 bits
per tick, depending on the tick value. Consequently, the Generator must assemble and
transmit n frames within a tick

n = 1200 / 576 ⇒ n ≈ 2 frames/3us or


n = 2400 / 576 ⇒ n ≈ 4 frames/6us

excluding the IFG times. With respect to the Generator functionality described earlier,
considering the case where each user incorporates a different Traffic Profile, the Traffic
Profile Engine must locate 2/4 timers expired per tick, the User Profile Engine must go
through all User Profiles, locate the 2/4 UPs deploying the particular TPs and store the
corresponding transmit TEs to the Transmit Scheduler FIFO. The Data Pump must be
able to fetch 2/4 FIFO entries, assemble the corresponding packets and deliver them to
the IXF440 device.

In the ingress side, the worst-case scenario for the Analyzer is realized when the full Gi
protocol stack - incorporating L2TP and PPP - is deployed due to the fact that more
statistics fields will have to be stored; therefore, more counters will have to be updated.
Specifically, the minimum size of an Ethernet frame fmin will be

fmin = PRE + SFD + Ethheader + IPheader + UDPheader + L2TPheader + PPPheader + IPheader


+ datamin + CRC ⇒
fmin = 7 + 1 + 16 + 20 + 8 + 16 + 2 + 20 + 8 +4 ⇒
fmin = 102 bytes or fmin = 816 bits,

without calculating the IFG times. As a result of the above the Analyzer must receive and
parse n frames per tick

n = 1200 / 816 ⇒ n ≈ 1.5 frames/3us or


n = 2400 / 816 ⇒ n ≈ 3 frames/6us.

- 93 -
Institute for Microelectronics and
UMTS Gi Iinterfacing and Measuring System
Information Technology

7.3 Generator Tuning


The aim of the final Generator and Analyzer architectures is to be able to attend the
400Mbps throughput of the network links. The Generator must be able to assemble
maximum two “virtual” user frames within a tick. In order to achieve the functionality
desired, it incorporates

• 16 timer management engines within the Traffic Profile Engine

• 32 parallel searching engines in the User Profile Engine

• a 512-entry Transmit Scheduler FIFO queue

• a Transmit Data Pump engine

Each of the 16 timer management engines is responsible for 256 TPs timers; similarly
each of the 32 UPE engines handles 128 transmit Traffic Events. The timer managers
must find maximum 2/4 timers expired, if there are any, so that 2/4 user transmit TEs
can be scheduled for transmission. The timer management engines sequentially access
and decrement timer durations until they find a timer expired. The UPE interfacing
module then signals the engines to stop execution, finds the engine(s) that yielded the
match(es), retrieves the corresponding TP IDs and lengths, buffers them and if the
current number of matches is below four, signals the engines to resume execution. In
case the 2/4 timers needed have been found or the engines have gone through the
entire timers list, the TP IDs and lengths are handed over to the UPE engine.

The Xilinx RAM blocks in which timer durations reside, have the capability to perform
both a read and a write operation within a single clock cycle. Therefore, each of the timer
manager engines can theoretically search all timers it is in charge of in 256 clock cycles,
which is adequately less than the worst-case tick value. Additionally, there is the delay
introduced by stalling the timer management engines and identifying the TP ID, which is
less than the 44 cycles remaining. The time interval for which engines are stalled is very
short, since there will be maximum two/four matches and consequently two/four “stall
cycles”.

The UPE engines must search the entire list of transmit TEs they manage and locate the
ones incorporating the TP IDs supplied by the Traffic Profile Engine. For each match
yielded, the UPE engines suspend execution and the Transmit Scheduler interfacing
module fills in a FIFO entry. The engines then resume their task until the next match is
found or until they reach the end of the transmit TEs list. The difference between the
TPE and the UPE is that there may be more than one User Profiles incorporating the
same TP, resulting in an increased number of matches. When each UPE engine
manages 128 transmit TEs, there are 172 cycles to handle stalls induced when a match
is found, considered adequate for system normal operation.

A relatively large Transmit Scheduler FIFO size was selected to accommodate the
varying packet transmission times. The larger a packet is, the more time it needs to be
transmitted; meanwhile simulation time will be advancing and there will be additional

- 94 -
Dimitris Tsaimos Msc Thesis Royal Institute of Technology

transmit TEs that will have to be served. The FIFO entries are organized in 3 Xilinx RAM
blocks, containing a FIFO entry element each. This is a more efficient memory
organization than sequentially storing FIFO element entries, due to the fact that the
entries will still occupy the same space, but each FIFO entry will be retrieved in a single
clock cycle.

Finally, the Transmit Data Pump can manage the throughput required deploying a single
engine, since the memory access setup times along with the actual memory accesses
can be carried out within the 75 clock cycles the Data Pump has at its disposal to fill a
IXF440 FIFO with a 64-byte Ethernet frame.

7.4 Analyzer Tuning


The Analyzer has to receive and parse 1.5/3 Ethernet frames per tick. In other words, all
packet header words must be retrieved from the MAC controller and their statistics-
related fields must be isolated within 2us, or 2000ns. Therefore, the Analyzer has at its
disposal 200/100 clock cycles to retrieve and process a frame from the IXF440 device.
Considering that each frame contains a total of 18 fields/field sets to be isolated and
stored, and that one field can be stored in one clock cycle, there are approximately 80
clock cycles remaining to fetch a packet header from a IXF440 FIFO. Additionally, one
header word can be transferred over the IXBus in one clock cycle plus the additional
FIFO access signals setup times.

Consequently, one instance of each Analyzer component is adequate to attend the


400Mbps throughput. Specifically, the Analyzer comprises

• a Receive Data Pump engine,

• a Statistics Fields Parser,

• a Statistics Fields Manager.

• a Statistics Engine.

The Statistics Fields Manager FIFO is implemented in one Xilinx RAM block and can
accommodate a total of 512 fields and/or field sets. Since the overall fields/field sets
used in statistics gathering are 19, the FIFO can store the statistics fields of

⎣512 / 19⎦ = 26 packets.

The Statistics Engine task is to simply fetch header fields and update counter values.
Therefore it can easily attend the average 3 frames/tick throughput, since it has at its
disposal 100 clock cycles to update all statistics counters related to a frame. As a result
of the above, a SFM FIFO incorporating one Xilinx RAM block is considered sufficient to
prevent any frames from being discarded.

- 95 -
Institute for Microelectronics and
UMTS Gi Iinterfacing and Measuring System
Information Technology

8. Implementation and Testing


8.1 HDL-based Design Flows
Both the Generator and the Analyzer are implemented in the Virtex-II FPGAs present in
the test tool board. In general, FPGAs-oriented hardware architectures obey to the
following design flow. Initially, a model is developed to realize the circuit logic and
simulated to discover any logical errors. Thereafter, the design is transformed into a
netlist using a logic synthesizer; the netlist model may be simulated to verify that the
design requirements are met. The netlist is processed by vendor-specific implementation
tools and mapped to FPGA resources. A timing simulation netlist may be optionally
produced. The particular netlist reflects the design behavior in the actual circuit and is
used for timing issues troubleshooting. Finally, the implementation tools create a
bitstream, which is downloaded into the physical FPGA and programs it to perform the
circuit functions.

The best practice when implementing hardware modules to go through the synthesis
process is to use a Register-Transfer Level (RTL) approach [27]. In RTL design, the
circuit behavior is modeled using a set of registers and transfer functions, having the
transfer functions describe the flow of data between registers. The RTL model
developed is originally simulated to verify syntactical correctness and functionality. It is
then transformed by a logic synthesizer into a gate-level representation, using libraries
specific to the FPGA technology. Libraries contain primitive elements - i.e. flip-flops,
memory cells and logic gates - with accurate timing information and other parameters
depending on the technology the FPGA vendor incorporates.

Additionally, during the synthesis process the designer specifies timing and/or area
constraints - the desired clock frequency and/or FPGA area occupation. The logic
synthesizer calculates the maximum clock frequency the design can achieve along with
the FPGA die area it occupies. However, the values calculated are approximations of the
ones achieved for the real circuit. The actual values can be only supplied by the
implementation tools after they have mapped and routed the design into the FPGA [28].
Additionally, both the implementation tools and the logic synthesizer may create a
simulation netlist. Post-synthesis simulation aims at the functional verification of the
design, whereas post-place and route simulation is used for troubleshooting timing
problems, in case timing constraints are not met. Finally, a bitstream is created and
downloaded to the FPGA to configure the device in order to execute the function
desired.

8.2 Xilinx Integrated Software Environment


Xilinx offers a development environment addressing all stages of HDL-based design
flows by means of the Integrated Software Environment (ISE) [29]. ISE is a software
comprising simulators, logic synthesizers, mapping and place-and-route tools all
accessible through a single Graphical User Interface. Simulators and logic synthesizers
are either third-party products purchased separately and integrated in the ISE or

- 96 -
Dimitris Tsaimos Msc Thesis Royal Institute of Technology

proprietary tools developed by Xilinx. The final selection of the actual tools to be used
depends on the preferences of the hardware engineer.

8.3 Xilinx Implementation Process


The term implementation tools refers to the proprietary software developed by every
FPGA vendor, residing below the logic synthesizer in the HDL-based design hierarchies.
Implementation tools accept a gate-level netlist from logic synthesizers and transform it
into the actual design implemented in the FPGA. The Xilinx implementation process
consists of four distinct parts [30]; translation, mapping, placement and routing,
bitstream generation (Figure 8-1).

HDL description
functional verification

fuctional
synthesis simulation
verification

translation

NGD timing analysis


mapping

implementation tools

placement and routing

bitstream generation

Figure 8-1 Xilinx design flow

Translation is the procedure of accepting the gate-level netlist produced by the logic
synthesizer and transforming it into a Xilinx-specific format that describes the logic
design in terms of elements such as flip-flops, gates and RAMs, according to Xilinx
Native Generic Database primitives. The file is then handed over to the mapping tool,
which maps the design to the components present in the target FPGA - LUTs, IOBs,
block SelectRAMs, etc. The output of the mapping process is a Native Circuit
Description file, which can be placed and routed. After the design is placed and routed, a
new NCD file is produced and fed to the bitstream generator. The bitstream generator
outputs a binary configuration file which is downloaded to the FPGA memory cells, so
that the FPGA is programmed to perform the function desired.

8.4 Implementation Goals


The Generator and Analyzer designs were developed according to the previously
mentioned design flow. The main goals of the overall procedure were

- 97 -
Institute for Microelectronics and
UMTS Gi Iinterfacing and Measuring System
Information Technology

• to provide a proof-of-concept implementation,

• to gain hands-on-experience with hardware design languages and the related


development tools,

• to study the performance and limits of the proposed architectures.

The development process should conclude with models of the instantiated architectures
presented in chapter 7, whose functionality would be verified through simulation of the
corresponding RTL description. Furthermore, the models should use synthesizable
constructs15 and successfully pass the simulation and place-and-route phases, satisfying
requirements in terms of speed and area occupation. Specifically, they should be able to
operate at speeds equal to at least 50MHz - the test tool board slowest clock - and fit
within the resources provided by the Virtex-II devices.

Additionally, the implementation should conform to the test tool throughput requirements
and be able to attend the 400Mbps throughput provided by the 4 10/100Mbps rear-
transition module Ethernet ports. According to the throughput analysis of chapter 7, the
Generator should locate and serve at least four timers expires within a tick, whereas the
Analyzer should retrieve and parse at least three frames within the same time interval.
Finally, the relationship between the engines deployed, the actual throughput achieved
and the FPGA area occupied was studied to examine the scalability of the architectures
proposed.

8.5 Electronic Design Automation Tools


The Generator and Analyzer designs were modeled using the VHSIC Hardware
Description Language (VHDL). Specifically, the Register-Transfer Level methodology
was adopted, to facilitate logic synthesis and the translation of the architecture to a gate
level design, appropriate for the Xilinx software tools to process. The functionality of the
models was verified by developing and simulating test scenarios, based on VHDL test
benches. The widely-deployed Modelsim simulator was used for functional verification,
whereas the Xilinx Synthesis Technology (XST) engine was utilized for synthesizing the
design. Finally, the proprietary Xilinx implementation tools translated, mapped, placed
and routed the design onto the Virtex-II device.

8.6 Testing
The VHDL models of the Gi system modules were simulated in order to verify their
functionality. Simulation was based on a set of scenarios and test benches, studying
particular aspects of the implementation. The procedure of designing and implementing
the test scenarios is of particular importance to the architecture verification process, as
the test benches can be used throughout the design flow, allowing for the same tests to
be repeated in all phases. The same test benches written for the initial functional

15
Synthesizable constructs are the subset of the Hardware Description Languages constructs that a logic
synthesizer is able to identify and process.

- 98 -
Dimitris Tsaimos Msc Thesis Royal Institute of Technology

verification may be later used for simulating the gate-level netlist produced by the logic
synthesizer and/or for post-place and route timing analysis.

8.6.1 Testing Architecture


The test scenarios were implemented using VHDL test benches. The test benches are in
essence circuit wrappers, embodying the circuit functionality and applying specific data
and stimuli to the logic tested. The major tools in the testing process were (Figure 8-2)

• text files,

• wire taps.

Text files contain any necessary initialization elements along with data obtained from the
wire taps. Wire taps are circuit observation points, capturing and storing data for cross-
examination with the actual data that the module tested should produce and/or for
further processing upon simulation completion. The test benches incorporate processes
that read/write data to the text files and where needed, compare data collected during
simulation with the actual results expected.

test bench
initialization
data

circuit logic outputs


stimuli

inputs

observation
wire taps
text data
file

Figure 8-2 Generic test bench block diagram

Furthermore, the testing procedure followed a modular approach, according to which


each module was tested separately; once all modules functionality was verified the
overall system behavior was examined.

8.6.2 Generator Verification


In the Generator side, the Traffic Profile Engine, User Profile Engine, Transmit
Scheduler and Data Pump implementations were examined. The overall procedure is
similar for all modules; initialization data needed by each module, such as Traffic Profiles

- 99 -
Institute for Microelectronics and
UMTS Gi Iinterfacing and Measuring System
Information Technology

for the TPE, User Profiles for the UPE and transmit Traffic Events for the Data Pump are
retrieved from text files and stored in internal Xilinx memories or fed directly to the
module as inputs. Additionally, the module test bench generates stimuli and stores
specific data captured by module outputs and/or wire taps to text files.

For clarity issues, we shall present the Transmit Data Pump testing procedure. The
TxDP should retrieve Transmit Scheduler FIFO entries, assemble packets and hand
them over to the IXF440 controller. A number of FIFO entries is manually entered in a
text file. The FIFO entries define that the TxDP should fetch a specific number of
headers for each protocol - i.e. 30 MAC, 25 IP and 20 UDP headers. Furthermore, the
actual packet headers are also included in a text file. Additionally, complementary logic
facilitating the test bench to identify the first and last word of each packet header is
incorporated to the simulation model.

The test bench primarily reads packet headers from the text file and applies the data
read as input data to the memory through its microprocessor interface. It then reads in
FIFO entries, parses them and applies them as inputs to the TxDP interface with the
Transmit Scheduler. The test bench wire taps capture the first and last memory
addresses corresponding to a protocol-specific packet header as well as the total
number of words the TxDP supplies to the IXF440 controller. The values captured are
then compared against the results expected, which are calculated prior to the simulation
and stored in a third text file. In case a mismatch is found, an error indication flag is
raised pointing the exact point of the simulation where the mismatch condition occured.
Furthermore, the total number of header words in conjunction with the simulation time
interval yields the Data Pump throughput. Moreover, the test bench emulates the
behavior of the IXF440 device by applying stimuli to the related TxDP inputs, in order to
examine the TxDP operation under various conditions, such as its reaction to the
interruption of a transfer due to the IXF440 FIFO being full.

8.6.3 Analyzer Verification


In the ingress side, the Receive Data Pump engine, Statistics Fields Parser, Statistics
Fields Manager and Statistics Engine modules should be simulated. The test setup is
analogous to that of the Generator, incorporating text files, wire taps and signals
facilitating the monitoring process. The most interesting modules in terms of simulation
were the SFP and the RxDP. The RxDP test bench is analogous to the one used for the
TxDP; consequently the SFP test bench operation will be briefly presented.

The SFP test bench reads in packet headers word by word from a text file and feeds
them to the SFP. The SFP then parses each word and outputs the fields contained
therein. The actual outputs are stored in another text file along with the packet header
word they were retrieved from, a string describing the kind of protocol header they
belong to and the actual header field the output is supposed to be - destination address,
header checksum, etc. The next word is read from the file only after the current one has
been parsed and the corresponding fields have been written to the related file.
Additionally, the Header Checksum Engine functionality was examined comparing its
outputs against header checksums calculated by a simple program written in C.

8.6.4 Test Data Acquisition

- 100 -
Dimitris Tsaimos Msc Thesis Royal Institute of Technology

For the purposes of the Gi system functional verification, there should be packets
incorporating the full Gi protocol stack headers and packets including only the MAC and
IP headers to test the functionality of the “no L2TP signaling” mode. Additionally, the
Traffic Profiles contained in text files should reflect real-world mobile network traffic.

Keletron had developed a proprietary traffic generator, which could create IP over PPP
over L2TP traffic and send it out to the network. The particular software was used to
collect valid packet headers for the Gi system simulation. The actual data collection
setup is depicted in Figure 8-3. The traffic generator was running on a Linux host
located in the Keletron domain. Another host was connected to the former one via a
router and was running a network monitoring utility, the Ethereal network analyzer [31].
The software was sending packets to the remote Ethereal host, which were captured
and stored in a text file. The packet headers contained in the text file were utilized to
feed valid data to the Analyzer during the simulation process. A similar setup was used
to capture plain IP traffic incorporating MAC and IP headers.

traffic generator Ethereal sniffer

router
Figure 8-2 Packet header collection setup

The construction of realistic Traffic Profiles was based on examples of existing packet-
switched applications and UMTS bandwidth allocation practices. Packet-based services
require a transport mechanism from the source to the destination of the service which
will fulfill certain requirements, so that the user will experience a satisfactory result. The
transport mechanism is the same for all mobile users; the air interface and the network
infrastructure. However, each user can reserve a different part of the mechanism
resources and therefore ensure a better result. The means for bandwidth reservation are
provided by the network Quality of Service architecture, realized in UMTS through four
different QoS classes. The TPs used in the Gi system simulation were based on
representatives of the QoS classes.

8.7 UMTS QoS classes


Network services are considered to be end-to-end, meanly from a mobile station to
another mobile station or a host in the fixed infrastructure. In order to ensure a certain
QoS for a particular application, the characteristics and functionality of the data path
between the service source and destination must be set up. This includes bandwidth
allocation and reservation in the Radio Access Network, the Core Network and External
Networks.

To distinguish amongst the requirements of different applications UMTS defines four


QoS classes [32], differentiated by the delay sensitivity of the traffic they are supposed
to carry

- 101 -
Institute for Microelectronics and
UMTS Gi Iinterfacing and Measuring System
Information Technology

• conversational,

• streaming,

• interactive,

• background.

The conversational class is used to carry the most demanding traffic in terms of delay,
such as video telephony and telephony speech. The streaming class is utilized in one-
way transports where delay requirements are still demanding but less stringent, i.e.
Video on Demand and High Definition Television. The interactive class is destined for
traditional Internet applications where a level of interaction is required, for example web
browsing and e-mail. Finally, the background class is the most delay insensitive one
utilized whenever the destination does not expect data to arrive within a specific time
interval, i.e. MP3 downloading.

8.8 Example Applications, Data Rates and Packet Sizes


The TPs characteristics were formed using a number of example applications as a valid
basis. The applications chosen along with the range of data rates they require16 and the
QoS class they belong to [33] is summarized in Table 8-1.

Table 8-1 UMTS packet-based applications and data rates

QoS class application data rate(kbps)


video telephony 32 - 384
conversational
VoIP telephony 17.5 - 8317
music/speech streaming 5 - 128
Video on demand 20 - 384
streaming
high definition TV 20 - 384
File transfers -
interactive Web browsing -

There are two major points to note regarding Table 8-1. First of all, non-real-time
applications do not have any particular QoS requirements and therefore no specific data
rate. Additionally, real-time applications have varying data rates depending on the actual
coder/decoder they incorporate and the way application data is encapsulated into
packets. Last but not least, we should mention that the data rates included in Table 8-1
are derived from the QoS requirements imposed by the applications alone; the network
may not be able to provide them.

16
The source for the data rates values is [33] unless otherwise noticed.
17
Source: “Voice over IP - Per Call Bandwidth Consumption”, Cisco corporation

- 102 -
Dimitris Tsaimos Msc Thesis Royal Institute of Technology

For interactive and background applications, an arbitrary bitrate could be assumed.


Specifically for web browsing, taking in consideration that a HTML page not including
images, video and/or audio clips has an average size of around ten kilobytes and that
according to [33] the end user should not experience a delay longer than 4 seconds for
each web page, yields a data rate equal to 20kbps. Similarly, using the download
speeds experienced by computers using a 56K modem, a 48Kbps data rate can be used
for bulk data transfers - we should note though that UMTS provides the capability for
higher connection speeds. For conversational and streaming applications, the examples
presented in [34] were the reference point. In [34], three use cases are presented
regarding Voice over IP calls, video streaming and video telephony. The resulting data
rates and packet sizes for each application are summarized in Table 8-2.

Table 8-2 UMTS multimedia applications characteristics

data rate (kbps) packet size (bytes) codec


VoIP 29 72 AMR
video streaming 59 117 H.263/MPEG-4
33 67 AMR
video telephony18
27 292 H.263/MPEG-4

Table 8-2 data rates refer to the total bandwidth consumed, including application data
and the transmission overhead. Additionally, the packet size is calculated assuming IPv4
packets. IPv6 packets result in larger packet sizes, since the IPv6 header comprises 20
extra bytes. Note that the row describing video telephony characteristics includes two
sub-rows. The first one describes the audio stream and the second one the video
stream; therefore the total bandwidth required for a video call is 60kbps.

Having gathered the information needed, bandwidth requirements and packet sizes were
translated into timer durations - in ticks - that would be used in the actual simulation. The
set of applications, data rates, packet sizes and corresponding tick values can be viewed
in Table 8-3.

Table 8-3 Multimedia applications and timer durations

application data rate (kbps) packet size (bytes) timer duration (ticks)
VoIP 29 72 3310
unidirectional video 59 117 2644
33 67 2707
video telephony
27 292 14420
speech streaming 29 72 3310
file transfer 48 1.400 38889
web browsing 20 1.400 93334

18
A video containing both voice and image consists of two separate media streams, one carrying audio and
one video data.

- 103 -
Institute for Microelectronics and
UMTS Gi Iinterfacing and Measuring System
Information Technology

The timer duration values td are derived using the following formula

td = (n / d) x (1 / tick duration)

where n is the packet size in bits, d the data rate in kilobits per second and tick duration
the tick value. Since the primary goal was to have a hardware implementation running at
50MHz minimum, the tick duration was set to 6 microseconds. Finally, let us note that
the packet size of 1400 bytes was chosen for file transfer and web browsing applications
as it is the size avoiding IP fragmentation in the PS domain according to [35].

8.7 Simulation
The purpose of the simulation was threefold. On the one hand, the functionality of the
Generator and the Analyzer was verified. On the other hand, it should be ensured that
the Generator and the Analyzer could attend the total throughput of a 400Mbps full-
duplex Ethernet connection. Finally, the limits of the proposed architecture should be
investigated.

As mentioned earlier, the Gi system incorporates two applications the “L2TP signaling”
and “no L2TP signaling”. In the description that follows, the test scenarios of the “no
L2TP signaling” mode will be presented for the Generator and the “L2TP signaling” one
for the Analyzer. The simulation of the complementing modes for each module is
analogous to the one described.

In general, the simulation process was incremental following three stages

• original verification,

• verification under realistic scenarios,

• full throughput operation study.

The initial verification stage aimed at proving the basic functionality for both modules.
Arbitrary timer durations and packet lengths were used for the Generator, whereas the
Analyzer was fed with a limited number of standard header words in order to examine
their behavior in short simulation intervals. Thereafter, realistic Traffic Profiles derived
from the aforementioned example applications were configured to study Generator
behavior in real-world scenarios, still not investigating its full potential. The scenario
used is given in Table 8-4.

Table 8-4 Simulation scenario

total bandwidth
application data rate (kbps) number of users
(kbps)
VoIP 29 30 870
unidirectional video 59 10 590
video telephony 60 20 1200

- 104 -
Dimitris Tsaimos Msc Thesis Royal Institute of Technology

speech streaming 29 5 145


file transfer 48 40 1920
web browsing 20 50 1000

On the other hand, the Analyzer was provided with realistic packet header data, as it
was retrieved by the Ethereal network sniffer. Once the system functionality had been
verified, full throughput operation was studied to cross-check that the Gi modules were
able to generate and parse network traffic at the full link rate. As was the case for the
initial verification, arbitrary timer durations and packet lengths were used for the
Generator at first. When coming down to testing full throughput operation with realistic
data, an issue was raised. The example applications and data rates could not generate
400Mbps of network traffic when 4096 users are supposed to be present in the network.
Consider a situation where all users deploy the most bandwidth-demanding service
included in the example, meanly video telephony. Then the overall traffic generated
would be equal to

4096 x 60 = 245760 kbps,

below the 400Mbps required. In order to overcome this limitation, a mechanism was
incorporated. Once the User Profile Engine found a matching transmit TE, it scheduled
four packets for transmission, one on each Ethernet port, instead of scheduling only the
one found. Therefore, a scenario had to be created that would generate 100Mbps of
network traffic; the modified architecture would translate it into 100Mbps on each port.
The actual scenario used is shown in Table8-5.
Table 8-5 Stress testing simulation scenario
total bandwidth
application data rate (kbps) number of users
(Mbps)
VoIP 29 450 14.50
unidirectional video 59 150 8.85
video telephony 60 200 12
speech streaming 29 50 1.45
file transfer 48 800 38.40
web browsing 20 1150 23

In the Analyzer side simulation was straightforward, as the process of supplying its
modules with data at the full rate was simple. Since packet header words were stored in
text files, reading one word per clock cycle - the clock was running at 50MHz - resulted
in a total throughput of 1.6Gbps, adequate for the purposes of the verification.

8.8 Synthesis
Having verified the Gi modules functionality, the next step was to proceed to the
synthesis process. Synthesis was performed using the Xilinx Synthesis Technology
(XST) tool, included in the ISE software.

- 105 -
Institute for Microelectronics and
UMTS Gi Iinterfacing and Measuring System
Information Technology

The goal throughout synthesis was to achieve the highest operating speed possible for
both the Generator and the Analyzer, in order to examine the limitations of the
architectures. The initial runs pointed out that the modules logic occupied a satisfactory
area of the FPGA resources and that no area constraints should be added to the XST,
enabling us to focus on the speed optimization. Thereafter, the XST was configured to
target at the maximum clock frequency for each module. The final results in terms of
FPGA resources utilization and the maximum clock frequencies for the Generator and
the Analyzer are summarized in Table8-6 and Table 8-7 correspondingly.

Table 8-6 Generator synthesis results

FPGA resources utilization clock signal


resource available used utilization (%) minimum period (ns)
slices 14336 1700 11
9.362
slice flip-flops 28672 1007 3
4-input LUTs 28672 2648 9 maximum frequency (MHz)
bonded IOBs 484 306 63
19
106.862
BRAMs 96 68 70

Table 8-7 Analyzer synthesis results

FPGA resources utilization clock signal


resource available used utilization (%) minimum period (ns)
slices 14336 704 4
8.23
slice flip-flops 28672 640 2
4-input LUTs 28672 1023 3 maximum frequency (MHz)
bonded IOBs 484 293 60
121.506
BRAMs 96 6 6

It is obvious that the Generator design consumes a larger amount of FPGA resources,
mainly due to the fact that a lot of information has to be stored in Xilinx RAM resources
and that the logic required for managing the information is relatively complex.
Consequently, the design critical path is longer and the signal propagation delay is
higher.

8.8 Implementation
Finally, the design went through the implementation process. Initially, the netlist
produced by the XST is transformed into a format appropriate for the mapping and
place-and-route tools to process. The mapping procedure correlates design logic with
FPGA resources and finally the place-and-route (PAR) tool places the design into the
FPGA and forms the actual interconnection between FPGA elements.

19
BRAMs is the abbreviation used by the synthesis tool for block SelectRAMs

- 106 -
Dimitris Tsaimos Msc Thesis Royal Institute of Technology

The area occupation and clock frequency results produced by the synthesis tools are an
approximation of the actual ones achieved by the implementation software. Therefore,
the actual performance of the hardware architecture is dictated by the post-PAR timing
and post-map resources utilization reports.

In general, placement and routing were timing-driven. Timing constraints were specified
through the PAR tool configuration environment in order to achieve the maximum clock
speed possible. Additionally, the ISE Pin-out Area Constraints Editor (PACE) was used
to assign the FPGA “pins” to the design inputs/outputs. The pin-assignment task
depends on the Printed Circuit Board design policy of each project. The typical case is
that the PCB is primarily designed and fabricated; based on the PCB design the
engineer must then edit the HDL modules pin locations according to the FPGA tracks
interconnection. Nevertheless, in cases where dense logic is expected to be
implemented in the FPGAs, the reverse process is followed. The modules are primarily
implemented and the PCB design is based on the way the architecture inputs/outputs
are mapped to FPGA I/O pins. The particular project was “built” on a already fabricated
PCB. The final results both in terms of FPGA resources utilization and clocking
characteristics are summarized in Table 8-8 and Table 8-9.
Table 8-8 Generator implementation results

FPGA resources utilization clock signal


resource available used utilization (%) period (ns)
slices 14336 1848 12
11.993
slice registers 28672 727 2
4-input LUTs 28672 3165 11 frequency (MHz)
bonded IOBs 484 306 63
83.382
BRAMs 96 68 70

Table 8-9 Analyzer implementation results

FPGA resources utilization clock signal


resource available used utilization (%) period (ns)
slices 14336 478 3
9.752
slice flip-flops 28672 766 2
4-input LUTs 28672 1278 4 frequency (MHz)
bonded IOBs 484 293 60
102.543
BRAMs 96 6 6

As it was expected, the Analyzer can achieve a higher operating frequency than that of
the Generator. Furthermore, the choice to design the system based on the
“conservative” 50MHz clock/6us tick value was justified, since the Generator can be fed
a clock running at maximum 83MHz.

8.9 Architecture Throughput Limitations

- 107 -
Institute for Microelectronics and
UMTS Gi Iinterfacing and Measuring System
Information Technology

So far it has been proved that the proposed architectures of the Generator and the
Analyzer can attend the 400Mbps throughput required. Nevertheless, this subsection will
prove that the exact same architecture can attend a 800Mbps throughput not
incorporating additional engines neither in the Generator nor in the Analyzer.

Regarding the Generator, the bandwidth bottlenecks are the Traffic Profile and User
Profile engines, since the Transmit Scheduler is a temporary buffer which can have an
arbitrary size, whereas the Transmit Data Pump is a word transfer engine from header
memories to IXF440 FIFOs. The Transmit Data Pump can transfer one 32-bit word in
every clock cycle. Considering a clock frequency of 50MHz, the Transmit Data Pump
throughput TxDPthr can be as high as

TxDPthr = 50000000 x 32 = 1.6 Gbps

The TPE throughput on the other hand is affected by two major factors, meanly the
number of timers a timer management engine is responsible for and the number of clock
cycles required to serve a timer expired. Timer service is translated into handing the
corresponding TP ID and TP length to the UPE. Each timer management engine can
both fetch and store one timer per clock cycle, therefore it requires number of TPs/tme
clock cycles to examine all timers it is in charge of. Additionally, the timer serving delay
consists of the cycles needed to identify the timer management engines that yielded the
match and fetch the corresponding TP lengths; the TP IDs are related to the memory
address of the timer expired. Finally, the values retrieved should be communicated to
the UPE. All of the aforementioned operations can be performed within a single clock
cycle each, resulting in a total of 3 clock cycles for a “match stall cycle”. The worst-case
scenario is realized when each “match stall cycle” yields a single match and the tick is
300 clock cycles. Consequently, the total number of timers that can be served by the
TPE will be

number of TPs/tme + match stall cycle x number of matches < 300

In the proposed architecture, number of TPs/tme equals to 256, leaving a total of 44


cycles to serve the total number of matches. Considering an even more conservative
value for the match stall cycle equal to 5 clock cycles, the total number of matches will
be 8, therefore 8 timers can be served within a tick.

The UPE throughput can be calculated in a similar fashion to the TPE one. Specifically it
depends on the number of users a UPE engine is in charge of and the “match stall cycle”
introduced by matches. Therefore, the following condition should be true

number of users/upe + match stall cycle x number of matches < 300

The match stall cycle consists of identifying the user ID - related to the transmit TE
memory address - and filling in a scheduler entry with the user ID, the transmit TE and
the corresponding TP length. In the proposed architecture, number of users/upe equals
to 128, leaving a total of 172 cycles to serve the total number of matches. The actual
number of matches depends on the number of users incorporating the TP that has
expired. In the worst-case scenario, each user will incorporate a different TP, therefore 8
matches will have to be found to attend the TPE throughput. According to the previous
formula, this can be easily achieved.

- 108 -
Dimitris Tsaimos Msc Thesis Royal Institute of Technology

With respect to the Generator throughput analysis presented in chapter 7, the


scheduling of 8 transmit TEs can lead to traffic generation of at least 800Mbps of
network traffic. Therefore, we can safely claim that the proposed architecture satisfies
not only the 400Mbps constant bit rate traffic generation, but the 800Mbps as well.

In the ingress side, the Analyzer throughput is governed by the clock cycles needed to
isolate and parse all header fields. Specifically, configuring the Statistics Fields Parser
with a transfer burst size equal to the total length of the headers to be fetched and
starting to parse header fields at an offset os after retrieving the first header word will
result in the following parsing time pt

pt = os + number of fields + parsing decisions delay,

due to the fact that each parsing field can be isolated and stored within a single clock
cycle. Having number of fields equal to 18, os equal to 9 - which are the words
comprising an entire Ethernet frame and IP header - and parsing decisions delay equal
to 13 - a conservative value taking in consideration that parsing decisions are needed
only for PPP and L2TP headers - there are 40 clock cycles required to entirely parse a
packet incorporating the full Gi protocol stack. According to the throughput analysis
included in chapter 7, if the Analyzer can parse 2 frames within 100 clock cycles, it will
be able to parse 6 frames within a tick and therefore attend 800Mbps of incoming
network traffic.

- 109 -
Institute for Microelectronics and
UMTS Gi Iinterfacing and Measuring System
Information Technology

9. Epilogue
9.1 Conclusions
The goal of the present thesis project was to design and implement an interfacing and
measuring system for the UMTS Gi interface to be used for networking equipment stress
testing and protocol conformance, as well as network links stress testing. The Gi system
comprises two sub-systems, the Generator and the Analyzer. The Generator is
responsible for generating constant bit rate traffic up to the network link capacity,
whereas the Analyzer must retrieve and parse incoming traffic, maintaining statistics
based on packet header fields. The two sub-modules should be implemented in two
FPGA devices and should be configurable in terms of FPGA resources utilization and
total throughput potentials, in order to be able to satisfy both higher and lower
throughput requirements and to be deployed in testing hardware having different
capabilities.

Initially a functional decomposition was performed for the modules, in order to identify
the basic tasks they should execute. Thereafter, parameterized hardware architectures
were designed and implemented using the VHDL hardware description language. The
architectures were configurable in terms of parallel engines deployed to carry out each
task. The actual number of engines for any particular task was based on the primary
throughput requirements imposed on the Gi system. Specifically, the system would
incorporate 4 10/100Mbps Ethernet ports and should therefore be able to fill up all 4
links with traffic and parse an aggregate of 400Mbps of incoming traffic.

The instance of the architectures followed all major steps of a HDL-based design flow,
passing through the stages of simulation, synthesis and implementation. Simulation was
used to verify the functionality of the Generator and Analyzer sub-modules, and their
overall operation as well. For the purposes of simulation, test scenarios were designed
based on both arbitrary and realistic data, obtained from valid 3GPP documents and test
beds, whenever that was feasible. Once the system was proven functional, the synthesis
and implementation tools available in the Xilinx ISE software were used to realize the
design in the FPGA resources and examine the limitations of the architecture in terms of
speed.

The final results of the thesis project clearly indicate that all goals have been achieved.
The Gi system architecture designed and implemented is able to attend the 400Mbps
throughput, operating at a frequency predefined by the test tool requirements.
Additionally, the exact same design can easily scale up to at least 800Mbps.

9.2 Future Work


Due to the development of the test tool having low priority in Keletron short-term plans,
the implementation was considered complete with the conclusion of the place and
routing process. Having a verified design that can be placed inside the FPGAs of the test
tool board was judged sufficient for the time being. However, even the most extensive

- 110 -
Dimitris Tsaimos Msc Thesis Royal Institute of Technology

simulations may not be able to identify possible system bottlenecks and/or point of
failures that may occur in real operating conditions. Therefore it would be particularly
useful to deploy the Gi interfacing and measuring system in a actual mobile network,
verify its behavior and collect live measurements.

Additionally, the system implemented can establish a solid ground to build additional
features on top of currently offered ones. Examples of such features, and perhaps the
most useful ones, would be the incorporation of a performance evaluation architecture
facilitating the collection of QoS measurements - round-trip delay times, packet loss
percentages, maintenance of statistics on a user level - and the addition of a “reactive
response” mechanism. The “reactive response” mechanism should comprise the
intelligence necessary to trigger the generation of valid responses to packets received.
The main application area of this kind of feature would be the testing of signaling
protocols over the Gi interface.

- 111 -
Institute for Microelectronics and
UMTS Gi Iinterfacing and Measuring System
Information Technology

Appendix A
A.1 Asynchronous Transfer Mode Protocol
The Asynchronous Transfer Mode (ATM) protocol is a connection-oriented protocol,
developed by the International Telecommunications Union to provide connection-
oriented services. A connection oriented-service is defined as [36]

“A service in which a connection-setup procedure must be implemented before data can


be exchanged between two users”

The principle upon which the particular class of services builds on is circuit-switching,
where prior to the data transfer between the endpoints a fixed data path is established;
the data path is maintained throughout the duration of the particular service. For
example, traditional Public Switched Telephone Service calls prior to the actual voice
transfer incorporate mechanisms to establish a fixed circuit-switched connection
between the caller and the callee which remains active during the phone call duration.
The advantage of connection-oriented services is the fact that they ensure guaranteed
delivery of information in the same sequence it was transmitted. Therefore they are
better suited to applications that have strict timing constraints. The originality of the ATM
protocol is the fact that it provides connection-oriented services without using permanent
connections, though it also allows for such capability. This kind of functionality is
achieved using virtual connections, thus avoiding the need to dedicate network
bandwidth to users. ATM packets carry themselves information relating user packets to
the virtual connection they belong to. The aforementioned approach leads to a better
network bandwidth utilization, due to the fact that network resources are not permanently
tied to connections between any two users; users request and consume bandwidth only
when they need to. Virtual connections can be multiplexed over the same network link
while still providing a level of service equal to that of circuit-switched connections.

A better understanding of ATM necessitates the introduction of its counterpart,


Synchronous Transfer Mode (STM). STM is a circuit-switched mechanism used by
telecommunication backbone networks to transfer packetized voice and data across long
distances. In STM endpoints allocate and reserve network bandwidth for the entire
duration of the connection, even when they don not need to transmit data. The data
transportation mechanism across a STM backbone, divides the bandwidth of network
links into a fundamental unit of transmission, called time-slot. The number of time-slots
on a link is fixed, whereas there is a 1:1 relationship between the link “users” and time-
slots. Each user is assigned a unique time-slot, during which it can transmit data. In case
the user has no data to transmit within his timeslot, network bandwidth remains unused.
From the above, we can outline two major disadvantages for STM

• the support for a limited number of simultaneous connections,

• the significant wastage of bandwidth.

- 112 -
Dimitris Tsaimos Msc Thesis Royal Institute of Technology

ATM addresses both limitations, by means of virtual circuits. Virtual circuits allow for
more simultaneous connections, using an efficient connection multiplexing scheme and
levitate bandwidth usage, assigning network bandwidth to users only when they have
data to transmit.

Another significant feature of ATM is its ability to efficiently transfer user traffic having
differentiated characteristics. For example, real-time traffic such as voice and high
resolution video can tolerate loss within particular limits, but has very strict delay
requirements. On the other hand, non-real-time traffic such as computer data can
tolerate delay within particular limits, but has very strict loss requirements. ATM
accommodates both kinds of traffic, using two key mechanisms

• the small, fixed-size packet size,

• the definition of service categories.

ATM packets - referred to as cells - are 53 bytes long, carrying 48 bytes of payload and
a 5-byte header. The header part conveys the data necessary for mapping the packet to
a virtual connection, as well as information on flow control. Flow control in ATM is not
based on end-to-end feedback; it is handled in hardware during the packet route to its
destination. Flow control is applied dynamically as the packet traverses the network,
based on the conditions encountered and the category of service the packet belongs to.
One of the flow control mechanisms is bit inside the cell header. The specific bit can be
set by ATM switches20 as they move the packet around the network, if they judge that
the packet should be discarded. Dropping ATM packets requires that their particular bit
is set and that they belong to a service category which allows packets to be dropped21.
Furthermore, the packet small size ensures a minimum impact on applications intolerant
to packet loss.

A.2 ATM Service Categories


ATM service categories are in essence Quality of Service (QoS) classes, establishing a
minimum set of traffic transmission requirements between the endpoint and the ATM
network. The QoS class user traffic will belong to is negotiated between the endpoint
and the network, during the connection setup procedure. There are four QoS classes
[36].

• Constant Bit Rate (CBR)


Data is moved around the network at a constant bit rate; the amount of
bandwidth necessary is reserved and guaranteed by the network, as in circuit-
switched connections.

• Variable Bit Rate (VBR)


VBR class is similar to CBR in that it reserves bandwidth for user traffic. The
difference is that reservation is not made for peak traffic rate, but allows for
peaks within specific limits - a feature called burst tolerance. VBR is further
20
An ATM switch performs functions analogous to that of a router, forwarding packets to their destination.
21
The service category does not guarantee that the packet will not eventually be dropped; it just ensures
that packets having lower priority will be dropped prior to the particular packet.

- 113 -
Institute for Microelectronics and
UMTS Gi Iinterfacing and Measuring System
Information Technology

divided into real time and non-real time, enabling the network to give precedence
to real-time VBR traffic.

• Unspecified Bit Rate (UBR)


Data is accepted with no constraints and is only transmitted if bandwidth is
available, without reservations and/or guarantees.

• Available Bit Rate (ABR)


Similar to UBR, with the difference being that the network provides congestion
feedback to endpoints. If endpoints react appropriately to the feedback, there will
be low data loss due to congestion.

A.3 ATM Adaptation Layers


Protocols utilizing ATM services are unaware of ATM internal workings. They have to be
used with any underlying transport mechanism and must definitely be independent of the
actual one used. ATM Adaptation Layers (AALs) are delegated the task of transforming
upper layer protocol data into a format appropriate for ATM to transmit. The ATM layer
receives 48-byte chunks of data from the particular AAL used and encapsulates it in
ATM packets. Due to the varying characteristics of traffic that can be transported using
ATM, there are five AALs - named AAL1 to AAL5 -each incorporating mechanisms that
suit the characteristics of the QoS class it is destined for. The most widely used ones,
and the ones utilized by UMTS, are AAL-2 and AAL-5.

AAL-2 is used to handle VBR traffic. VBR as we mentioned earlier is similar to CBR
traffic, but tends to be bursty in nature. Data comes in bursts and must be transmitted at
the peak rate of the burst; however the average time between bursts may be large and
randomly distributed. AAL-2 uses 4 out of the 48 available ATM cell payload bytes that
contain an “AAL-2 specific” header, thus limiting the efficient capacity of the ATM cell to
44 bytes of payload (Figure A-1). Therefore, it is particularly suited to low-rate voice
traffic with compression and silence suppression, which forms low-byte payloads fitting
inside the ATM cell. Furthermore, it incorporates the capability of multiplexing voice
packets from different users on the same ATM virtual connection, by means of the “AAL-
2 specific” header.

higher layer protocol data unit higher layer protocol data unit AAL5 trailer padding

AAL2 Layer AAL5 Layer

44-bytes chunk ..... ..... 44-bytes chunk 48-bytes chunk ..... ..... 48-bytes chunk

ATM Layer ATM Layer


ATM cell ATM cell
ATM header AAL2 header 44-bytes chunk ATM header 48-bytes chunk

Figure A-1 AAL2 vs AAL5 process

AAL-5 is the primary AAL for data, handling connection-oriented and connectionless
data traffic. It uses different mechanisms than the AAL-2 to ensure better utilization of

- 114 -
Dimitris Tsaimos Msc Thesis Royal Institute of Technology

the ATM cell payload. The scope of the AAL-5 designers was to develop a light-weight
adaptation layer which would enable the ATM layer to efficiently carry data. The
functions of AAL-5 are applied per a higher-layer entire packet and not per ATM cell
payload (Figure A-1).

A 8-byte “AAL-5 specific” trailer is applied to the packet received by higher layer
protocols along with padding to ensure that the packet plus trailer plus padding is an
integral number of 48-byte chunks. The entire packet is then separated in 48-byte
segments and handed over to the ATM layer to construct cells. The result of the
aforementioned scheme is that AAL-5 introduces far less overhead than AAL-2. The
second mechanism employed by AAL-5 is a bit in the ATM cell header, indicating
whether the particular cell is the last one of a higher layer protocol packet or if more cells
follow. Finally, we should note that AAL-5 does not provide the multiplexing capabilities
of AAL-2; it assumes that a connection belongs to a particular user and that data from
the user is sequential.

A.4 ATM and the UMTS Network


UMTS selected ATM as its transmission technology due to the fact that it possesses the
ability to carry different types of traffic on the same medium in an integrated fashion, as
well as provide the QoS requirements needed for telecommunications. AAL-2 was
selected to carry voice and video and AAL-5 to carry packet data, due to the fact that
they were the AALs characterized by the properties to perform such a task. In order to
understand the services offered by ATM to the UMTS network, we will present the
position of the ATM protocol in the Open Systems Interconnection (OSI) reference
model.

ATM can not be directly mapped to an OSI layer. One the one hand it performs functions
corresponding to the OSI layer 2 - data link layer - but on the other hand it also performs
higher-layer protocol functions such as flow control and routing; recall from previous
subsections that the ATM cell header contains routing information. The common practice
is to consider ATM along with AALs as a data link layer protocol

The main task of the data link layer is to present an error-free transmission medium to
the network layer, by detecting and correcting errors that may occur during information
transmission by the physical layer. As we stressed earlier, ATM provides more than just
that and in conjunction with its routing features can constitute an efficient mechanism to
transport data across a network.

UMTS uses ATM as the transport mechanism. All UMTS higher layer protocols offer
their services using the ATM layer service. Circuit-switched, packet-switched and
signaling traffic is carried inside cells within the UTRAN (Figure A-2). Specifically, in the
Iu-CS interface AAL-2 is utilized to support connections with variable bit rate and
minimal delay in a connection-oriented mode. AAL-5 is used in the Iu-PS interface to
carry packet-switched user traffic. Furthermore, AAL-5 is used throughout the UTRAN as
the adaptation layer for the transport of signaling and control information.

This section did not deal with the internals of ATM such as packet header format or
Virtual Paths and Virtual Channels. We attempted to provide an abstract description,

- 115 -
Institute for Microelectronics and
UMTS Gi Iinterfacing and Measuring System
Information Technology

focusing on the actual functionality that ATM offers. There are many more to ATM than
what described. A very good introduction to ATM internals is [37], whereas an in-depth
description analyzing all ATM aspects can be found in [38].

Signaling Data

AAL5 AAL2

ATM
Iub
Physical
Iub
Signaling Data CS domain
Node B Iu-CS
AAL5 AAL2

ATM
Iub Iu-PS
Physical RNC
Signaling Data
Iur
Signaling Data AAL5 PS domain
PS domain
AAL5 AAL2 ATM
Node B
ATM Physical

Physical

Figure A-2 UTRAN ATM protocol stacks

- 116 -
Dimitris Tsaimos Msc Thesis Royal Institute of Technology

Appendix B
B.1 GSM Phase 2+ Core Network Basic Entities and
Interfaces
The UMTS Core Network (CN) is based on the enhanced GSM Phase 2+ Core Network.
This subsection provides a description of the GSM Phase 2+ CN basic network entities
and interfaces, leaving aside Short Messaging Service and Customized Application for
Mobile Enhanced Logic subsystems.

The CN is separated in Packet-Switched (PS) and Circuit-Switched (CS) domains [39].


The two domains differ in the way they support user traffic. The CS domain refers to the
set of CN entities that enable circuit-switched connections for user traffic, including
entities that support the related signaling. Circuit-switched connections are connections
for which dedicated network resources are allocated during connection establishment
and released at connection release. The PS domain refers to the set of CN entities that
enable packet-switched connections for user traffic, along with entities that support the
related signaling. Packet-switched connections transport user traffic using autonomous,
independent information transfer units, the packets. Each packet is self-describing and
contains all information necessary to reach its destination independently from any other
packet. The description of the GSM CN that follows is based on the division of the CN
into a CS and a PS domain, grouping entities and interfaces according to the domain
they belong to.

B.2 CS and PS Domains Common Entities

The particular entities can be generally grouped into three categories

• entities related to mobile subscriber location,

• entities supporting security services,

• entities related to Short Messaging Service functionality - not of interest to the


specific document.

In order for communication to be established between a subscriber and the mobile


network, information related to subscriber location is necessary. The Home Location
Register (HLR) and the Visitor Location Register (VLR) are responsible for keeping track
of the actual location of mobile stations. Practically speaking, the HLR and VLR are
databases holding - amongst other data - location information. The HLR is in charge of
mobile subscriber management, storing subscription and location data which enable
routing and charging of calls for a particular subscriber. The VLR on the other hand

- 117 -
Institute for Microelectronics and
UMTS Gi Iinterfacing and Measuring System
Information Technology

stores the same kind of information as the HLR for mobile stations roaming22 in the area
it is responsible for. The VLR and HLR cooperate in order to allow the proper handling of
calls related to mobile stations roaming in the VLR area. The number of HLRs and VLRs
in a mobile network is not fixed. For example, a mobile network may have one or more
HLRs, depending on the number of subscribers, the capacity of the equipment, and the
organization of the network.

Security services include user traffic encryption, mutual authentication between the
subscriber and the mobile network and fraud protection. Security functionality in the CN
is implemented by means of the Authentication Center (AuC) and the Equipment Identity
Register (EIR) entities, each one addressing a subset of the aforementioned
considerations. The AuC is involved in performing mutual authentication between the
subscriber and the network, deciding the network services the mobile station has access
to and providing security keys used in cryptographic operations. The EIR deploys a very
simple principle to provide fraud protection. It stores a list of mobile equipment used in
the network it is responsible for and further organizes that information in three sub-lists

• White

• Grey

• Black

Access to users holding a particular piece of mobile equipment depends on the category
the equipment belongs to and the access policy enforced by the network operator.
“Black listed” equipment for example is considered stolen and the network operator may
deny access to the network for the subscriber using it. Mobile equipment is uniquely
identified by means of the International Mobile Equipment Identity (IMEI).

B.3 CS Domain Entities


The CS domain delivers circuit-switched services to the mobile subscribers, utilizing the
functionality of its individual entities. First of all, the Mobile-services Switching Center
(MSC) performs all necessary functions in order to handle the circuit-switched services
to and from the mobile stations. It is the interface of the radio interface to fixed networks.
The MSC performs all the switching and signaling functions for mobile stations located in
a geographical area designated as the MSC area. The MSC is fairly complex, as during
transactions it has to take into account the allocation of radio resources and the mobile
nature of the subscribers. Additionally, it has to perform procedures required for user
location registration and for handover23.

On top of the MSC, resides the Gateway Mobile Switching Center (GMSC). The GMSC
stands at the edge of the mobile network and acts like a call router, whenever calls are
destined for stations located outside a MSC area. It is the interface of the mobile network
to other mobile networks and external circuit-switched networks such as the Public

22
Roaming refers to a situation where the network serving the mobile subscriber is different than the
network with which the subscriber has signed the service agreement.
23
Handover is the process in which a cellular phone is handed from one cell to the next in order to maintain
a radio connection with the network.

- 118 -
Dimitris Tsaimos Msc Thesis Royal Institute of Technology

Switched Telephone Network (PSTN). If a MSC cannot interrogate the HLR about the
location of a mobile subscriber, it delivers the call routing to the GMSC. Last but not
least, the InterWorking Function (IWF); a functional entity associated to the MSC. The
purpose of this entity is to provide the necessary functionality to allow interworking
between a PLMN and external circuit-switched networks - PSTN, ISDN, etc – referred to
as fixed networks. The IWF functions depend on the services and type of the external
network. What the IWF is required to do, is to convert the protocols used in the mobile
network to those used in the interfacing fixed network. In case the mobile and fixed
network services are compatible an IWF is not needed.

B.4 PS Domain Entities


Recall that the PS domain is delegated responsibility for packet-based services, like web
browsing and e-mail. The term used for the PS domain entities is GPRS Support Nodes
(GSN), due to the fact that they were introduced in the Generalized Packet Radio
Services network architecture, to support packet-switched services. The GSNs perform
all necessary functions in order to handle the packet transmission to/from mobile
stations. A detailed description of the GSNs is provided in the GPRS network section of
this chapter. For the time being, we will only mention that there are two GSNs the
Serving GSN (SGSN) and the Gateway GSN (GGSN).

B.5 CS and PS Domain Common Interfaces


The particular class of reference points incorporates interfaces between entities common
to both domains, as well as interfaces between entities of the PS and CS domains.
Specifically, the Home Location Register and the Authentication Center communicate
over the Gs reference point in order for the HLR to retrieve authentication and ciphering
data related to mobile subscribers. The MSC and the SGSN utilize the Gs interface to
exchange subscriber location information, paging requests, etc.

B.6 CS Domain Interfaces


CS domain reference points include interfaces between the entities offering circuit-
switched functionality and entities common to both CS and PS domains, as well as
interfaces between the CS domain entities. The Mobile Switching Center and its
associated Visitor Location Register communicate over the B-interface. As mentioned
earlier, the VLR is the location and management database for the mobile subscribers
roaming in the area controlled by the MSC. Whenever the MSC needs data related to a
given mobile station roaming in its area, it interrogates the VLR. In the reverse path,
whenever a mobile station roams to another location area, it initiates a location updating
procedure with the MSC in charge of that area. The corresponding VLR then retrieves
from the MSC and stores current location data.

The HLR and VLR communicate over the D-interface. Data related to mobile station
location and subscriber management are exchanged via the D-interface. The main
service provided to the mobile subscriber is the capability to set up or to receive calls
within the areas served by the HLR and the VLR. Data exchanges between the two
entities may occur in the case a mobile subscriber requires a particular service, when

- 119 -
Institute for Microelectronics and
UMTS Gi Iinterfacing and Measuring System
Information Technology

he/she wants to change some data attached to his/her subscription or when some
parameters of the subscription are modified by administrative means.

Communication between VLRs under different operating authorities takes place over the
G-interface. For example, when a mobile subscriber moves from an area a VLR is
responsible for to an area controlled by different VLR, the subscriber will have to register
his current location. The new VLR may need to retrieve parameters related to user
identification and authentication from the old VLR.

The Gateway MSC utilizes the C-interface towards the HLR, so as to obtain routing
information for a call or a short message directed to particular subscriber. The MSC uses
the F-interface towards the Equipment Identity Register, in order to enable the EIR to
verify the status of the International Mobile Equipment Identity retrieved from a mobile
station. Finally, MSCs exchange data over the E-interface. When a mobile station moves
from an area a particular MSC is responsible for to another one controlled by a different
MSC, a handover procedure has to be performed so that the user connection to the
mobile network is not interrupted. The MSCs exchange any data necessary to perform
this operation over the E-interface. After the handover operation has been completed,
the MSC servers exchange information related to Iu interface signaling.

B.7 PS Domain Interfaces


PS domain reference points are referred to as GPRS interfaces, due to the fact that they
were introduced along with GPRS Support Nodes to offer GPRS packet-switched
services. GPRS interfaces include interfaces between the GSNs and entities common to
both CS and PS domains, as well as interfaces between the GSNs. The SGSN and the
HLR communicate location information to each other by means of the Gr-interface. The
SGSN informs the HLR of a mobile station location, while the HLR sends to the SGSN
all data needed to support mobile subscriber packet-based services. Exchanges of data
may occur in the case a mobile subscriber requires a particular service, when he/she
wants to change some data attached to his subscription or when some parameters of
the subscription are modified by administrative means.

The Gn and Gp interfaces are used to support mobility between the SGSN and the
GGSN. The Gn-interface is used when the GGSN and the SGSN belong to a network
operated by a single authority, whereas the Gp-interface is used if the GGSN and the
SGSN are located in networks operated by different authorities. The functionality of the
Gp-interface is similar to that of the Gn, including extra security functionality based on
mutual agreements between operators. The SGSN and the EIR utilize the Gf interface in
order for the EIR to verify the status of the IMEI retrieved from a mobile station. Last but
not least, an optional interface between the GGSN and the HLR has been defined to
provide a signaling path for the GGSN to retrieve information about the location and
supported services of a mobile subscriber.

B.8 External Networks Interfaces


UMTS is interconnected with existing networks, such as the fixed telephone network and
the Internet in order to broaden the range of services provided to mobile subscribers.
External networks interfaces include two distinct reference points, one for the

- 120 -
Dimitris Tsaimos Msc Thesis Royal Institute of Technology

communication of the CS domain to fixed circuit-switched networks and one for the
interconnection of the PS domain to fixed packet-switched networks. External circuit-
switched networks interface with the MSC by means of an interface specific to the call
control mechanisms of the fixed network. External packet-switched networks interface
with the GGSN using the Gi interface. In order to clarify matters, the following figure
(Figure B-1) depicts a typical UMTS network configuration, integrating the UMTS most
important functional entities and their corresponding interfaces.

- 121 -
Institute for Microelectronics and
UMTS Gi Iinterfacing and Measuring System
Information Technology

Figure B-1 UMTS network architecture

- 122 -
Dimitris Tsaimos Msc Thesis Royal Institute of Technology

Appendix C
C.1 PCI Industrial Computer Manufacturers Group
The PCI Industrial Computer Manufacturers Group (PICMG) [40] is a consortium
consisting of more than 450 companies which collaboratively develop open
specifications for high performance telecommunications and industrial computing
applications. The consortium was originally formed24 to extend the Peripheral
Component Interconnect (PCI) standard. PICMG is responsible for the development of
the PCI 3U and 6U Eurocard form factor, commonly known as Compact PCI. Apart from
the formation of the Compact PCI basic specification, which is basically a re-engineering
of the PCI bus, PICMG has issued additional specifications to provide enhanced
features for Compact PCI. Examples of such specifications include the Compact PCI
Hot-Swap architecture as well as the Compact PCI Packet Switching Backplane.

C.2 Compact Peripheral Component Interface


The compact Peripheral Component Interface (cPCI) is a specification targeting
industrial computer systems. Industrial computer systems are high-speed computers
mainly used for real-time machine control, industrial automation, networking and
telecommunications equipment. cPCI was developed specifically to address the
particular requirements characterizing applications that deploy such computers and
primarily their need for high processing capacities, increased input/output capabilities
and high bandwidth hardware interconnection schemes.

The hardware architecture used in a computer system can be viewed using a layered
approach, consisting of the following four layers

• physical,
defines the board format, connector type and positions

• electrical,
addresses issues such as technology used to send and receive data, pin
assignment and clock speeds

• protocol,
describes how transfers, arbitration and responses to interrupts are made

• integration,
defines how the system can be integrated as a peripheral module of a larger
system. Issues to consider include how boards are mapped to the address
space, how they are detected and finally the way they are configured.

24
PICMG was founded in 1994.

- 123 -
Institute for Microelectronics and
UMTS Gi Iinterfacing and Measuring System
Information Technology

With respect to the physical layer, the cPCI shape and format is based on an already
existing standard, the Eurocard industry standard. The Eurocard form factor defines two
board sizes

• 3U (100mm x 160mm) with one connector

• 6U (233mm x 160mm) supporting the 3U and 2 extra connectors

The 3U connector is a 220 pin connector. The 6U board supports one additional 220-pin,
identical to the 3U, plus a 95-pin connector. The connector is of socket-pin type,
facilitating board interconnection without cabling.

The electrical and protocol layers of the cPCI specification are an extension of the
Peripheral Component Interface (PCI) bus aiming at the needs of the industrial
environment. The PCI specification defined the electrical requirements of the bus, the
protocol to govern PCI bus transactions, PCI bus speed, as well as flow control to allow
it to interoperate with slower peripheral devices. PCI uses multiplexed address and data
lines and is best suited to peripheral devices that need to exchange large blocks of data.
PCI busses are 32-bit wide and can operate at 33 or 66MHz. Additionally, there is an
extension regarding a 64-bit wide bus. The cPCI specification was originally electrically
compatible with the 32 and 64-bit versions of the PCI specification, operating at 33MHz.
This lead to a cPCI bus maximum throughput equal to approximately 130Mbytes/sec for
the 32-bit mode and 264Mbytes for the 64-bit mode. The throughput of the cPCI bus was
further leveraged using 32 and 64-bit mode operation at 66MHz, whereas there also is a
specification which incorporates PCI-Xpress into cPCI.

C.3 cPCI vs PCI


The PCI bus has been the de facto standard for system designers in the last few years,
sitting at the heart of modern desktop computers. It has been defined by Intel as a local
bus providing a high-bandwidth link between the CPU and high-speed peripheral
devices. However, the PCI specification was formed with desktop computers in mind and
therefore could not meet the demands of industrial computer systems. Industrial
computers must be capable of operating reliably under any circumstances, having a
meantime between failures measured in tens of thousands of hours. They should
tolerate heat, dirt, intense shocks and vibrations. Furthermore, even in the case where
there are failures, the system out-of-service time is very important when considering
high-capacity servers. What the cPCI specification did was to maintain all powerful
characteristics of PCI, such as the high bandwidth and the plug-and-play capability, and
complement it with features essential for industrial computers.

From the mechanical point of view, cPCI boards are designed for front-loading and
removal from the rack and they are held firmly in position by the socket-pin type
connectors, special card guides on both sides and a face plate which screws into the
rack. The aforementioned factors constitute a very reliable anti-shocking and anti-
vibration scheme compared to the “loose” card-edge PCI connector and the way a PCI
card is mounted on a motherboard. Additionally, it enables easy insertion and removal of
boards on the rack.

- 124 -
Dimitris Tsaimos Msc Thesis Royal Institute of Technology

From the electrical point of view, the cPCI specification expanded the PCI electrical
specification to address its physical limitations. Due to physical line transmission issues,
the PCI bus can accommodate up to four slots, thus being able to interconnect only four
peripheral devices. However, more than four slots can be supported by means of a PCI-
to-PCI bridge. A PCI-to-PCI bridge acts as a PCI “repeater”, connecting two PCI bus
segments together. Even in this case the default four slot support is not enough for
industrial applications, where usually a large number of different I/O functions must be
performed and therefore a large number of expansion cards is needed. The cPCI bus
has been engineered to support up to 8 slots in a bus segment when operating at
33MHz, and up to 5 slots when operating at 66MHz while preserving the PCI bridge
expansion scheme. This is a major improvement compared to the PCI bus which can
support up to 4 slots at 33MHz and up to 2 slots at 66MHz.

The cPCI specification has also established a mechanism for insertion and removal of
expansion cards while the system is up and running. The particular feature is referred to
as Hot-Swap and utilizes staged backplane connector pins that enable the board to be
powered “seeing” its PCI signals in high impedance state before PCI contacts make
contact with the backplane. Once the board is fully inserted, the PCI bus is activated and
the board is initialized. The Hot-Swap mechanism eliminates system out-of-service time,
as in the case where a particular expansion board fails, it can be removed and re-
inserted to the cPCI system without disturbing its operation. The service provided by the
particular board will of course not be available, but the rest of the boards will remain
unaffected. The Hot Swap specification is based on a similar specification for the PCI
bus, called Hot Plug. The major difference between the two specifications is that in cPCI
all circuitry required to enable Hot Swap functionality is located on the cPCI expansion
cards, whereas in PCI Hot Plug the circuitry is located on an active motherboard.

PCI offered a limited number of interconnections for a PCI board to communicate with
other boards, mainly the data bus. This was clearly unsuitable for high-speed
applications, which can benefit from a large number of Input/Output (I/O) pins by
establishing additional data paths apart from the system bus. The cPCI specification
offers a plethora of interconnections with great flexibility in the functions they provide. A
cPCI board can have any number between 220 and 535 pins for I/O operations. The
pins are organized in 5 connectors, named J1 to J5. The J1 connector is present on all
cPCI boards and implements a 32-bit PCI bus interface. J2 can be present in both 3U
and 6U cards (Figure C-1). When present, it is used to implement either a 64-bit PCI
bus interface or user-defined interfaces to external circuitry. J1 together with J2 offer a
total of 220 pins. J4 and J5 on the other hand are the equivalent of J1 and J2 at the
upper half of the board. They also have a total of 220 pins and their functionality is user-
defined. Finally, J3 is a 95-pin connector used solely for user-defined I/O. As we can see
from the above, cPCI offers a large number of user-defined pins, leaving the functionality
they should provide to the board designer. Apart from providing proprietary functionality,
user-defined pins are commonly used to realize industry standard busses, such as VME.

- 125 -
Institute for Microelectronics and
UMTS Gi Iinterfacing and Measuring System
Information Technology

J5
220 pins
user-defined I/O or
J5 I/O and J4 CT bus
6U board optional J4

95 pins
233.35 mm J3 user-defined I/O

Required for
64-bit cPCI bus J2
220 pins
100 mm 3U board cPCI bus

J1

160 mm

Figure C-1 3U, 6U cPCI format and corresponding connectors

Furthermore, there is a cPCI specification defining the implementation of a standard bus


used for computer telephony applications on the J4 connector, the Computer Telephony
(CT) bus.

Finally, cPCI maintained the auto-configuration capabilities of PCI. PCI utilizes a


predefined set of registers that contain information on the device identity as well as
software programmable parameters, such as address maps or interrupt types, enabling
the CPU to automatically detect and configure a device present on the bus.

C.4 cPCI System Architecture


When using the term cPCI system, we refer to a complete cPCI computer, in the same
sense that one refers to a desktop computer. Alike a desktop computer, a cPCI system
incorporates the corresponding enclosures, motherboard, and central processing unit.
Specifically, the basic building blocks of a cPCI system are

• a cPCI compatible enclosure,

• the backplane, which actually implements the cPCI bus and contains the slots in
which peripheral boards are inserted,

- 126 -
Dimitris Tsaimos Msc Thesis Royal Institute of Technology

• the Single Board Computer, delegated general-purpose functionality and system


operation coordination,

• the system-specific peripheral boards, usually deployed in I/O operations and


processing-intensive applications.

C.4.1 cPCI Enclosures

As cPCI systems are meant to operate in industrial environments, they impose far more
stringent requirements to the enclosure which “hosts” them. Additionally, the flexibility
provided by the cPCI specification both in terms of board sizes as well as number of
cPCI slots that can reside on a backplane lead to various sizes of enclosures, tailored to
the needs of particular ranges of applications. Furthermore, enclosures can either be
portable or mounted on a larger rack or exist as stand-alone enclosures.

When choosing an enclosure for a cPCI system, essential issues to consider are its size,
powering and cooling characteristics due to the fact that they can dramatically affect the
system performance and reliability. Further considerations include [41]

• Electrical shielding.
Electronic components emit radiation during their operation. This radiation can
affect the operation of other components on the same system or even in adjacent
systems. The enclosure should not allow the emission and/or reception of
excessive emissions to/from external sources that could interfere with the system
operation.

• Vibration and shock protection.


The enclosure should protect the hosted system from failures due to shocks and
vibrations, which are definitely part of an industrial environment.

• Fire risk analysis and resistance.


The hosted equipment should be as less damaged as possible in the case of a
fire breakout. A very basic measure would be to include as few plastic parts in an
enclosure as possible, as they can be easier damaged than their metallic
counterparts.

• Operational stability in an industrial environment.


An industrial environment is usually far away from the typical desktop PC
environment; increased temperature, humidity and dust are just few of the
differentiated factors.

In order to have a common ground as to whether enclosures meet certain requirements,


a number of standards have been defined. The scope of these standards is mainly to
establish a set of tests, which an enclosure should successfully pass in order to meet a
minimum set of requirements. The Network Elements Building Standard (NEBS) is
perhaps the most well-known standard. The NEBS addresses telecommunication
systems enclosures; it defines tests for a wide range of functionality a NEBS-compliant
enclosure should provide [42]. Most cPCI enclosures are compliant to either a subset or
the whole NEBS standard.

- 127 -
Institute for Microelectronics and
UMTS Gi Iinterfacing and Measuring System
Information Technology

Recall from previous subsections that cPCI board sizes are defined in terms of “U” units.
Specifically, we mentioned that there are two board sizes, 3U and 6U. The “U” unit refers
to boards and enclosures height, whereas their length is expressed in millimeters. One
“U” is equal to 1.75 inches. Therefore, a 3U card is 100mm high and a 6U card is
233mm high. Similarly, the enclosure sizes are defined in terms of “U” units. Common
enclosure sizes include 1U, 2U, 4U, 10U and 12U. Furthermore, apart from their varying
heights, enclosures accommodate the two different board sizes. There are enclosures
for 3U cPCI boards and enclosures for 6U cPCI boards.

The minimum responsibility of an enclosure is to protect the hosted cPCI system and to
provide an efficient powering and cooling scheme, facilitating the air flow through the
enclosure. Most enclosure vendors provide hot-swap capable fans and power supply
units so as to increase the availability of the hosted cPCI system. The bigger an
enclosure, the bigger the backplane that can be mounted on it and consequently the
higher the number of cPCI slots the cPCI system will include. For example, a 2U
enclosure usually supports 4 cPCI slots, whereas a 4U enclosure supports 8 cPCI slots.
cPCI enclosures are fabricated from metal or aluminum, incorporating as little plastic as
possible. Additionally, there are special-purpose portable enclosures. From the above, it
is clear that selecting the right enclosure depends on the needs of the design in terms of
number of cPCI slots, portability and the available space used to place the cPCI
enclosure.

Finally, let us note that although the open architecture of cPCI specification enables
system designers to purchase the backplane and the enclosure from different vendors, it
is common practice to acquire both from the same vendor.

C.4.2 Backplane
A backplane is a printed circuit board that has several connectors attached and
incorporates their interconnection scheme. Connector pins are interconnected to form
data paths transferring information from/to the boards mounted on the backplane. In
essence, the backplane is a backbone over which several cards communicate forming a
complete computer system [43]. Backplanes have the advantage of eliminating cables to
interconnect printed circuit boards. They offer greater reliability, as they do not suffer
from mechanical failures due to plugging and unplugging cables for board insertion and
removal. In general, they are categorized as active or passive. Active backplanes
include chips that can drive the system bus in order to perform a particular task. A
desktop computer motherboard is a typical example of an active backplane. Passive
backplanes contain no bus driving circuitry; they just provide the boards
intercommunication mechanism.

The cPCI backplane is passive. Its sole purpose is to establish communication between
boards mounted on its connectors. Systems based on passive backplanes are less likely
to fail and are therefore particularly suited to embedded and industrial computers, as
they increase their reliability. Due to the passive backplane scheme, the coordination
functions an active motherboard would perform are carried out by one or more CPU
cards, called Single Board Computers (SBCs).

- 128 -
Dimitris Tsaimos Msc Thesis Royal Institute of Technology

Another industry-specific feature is mounting additional boards on the back of the


backplane. This feature is commonly referred to as rear panel I/O [44]. The rear panel
I/O approach is of increased importance to telecommunications equipment, as it enables
fast and easy replacement of an I/O card without disrupting cabling, which usually
includes a large number of phone lines or network connections. To support rear-panel
I/O the backplane utilizes double headed connectors, mirroring front board signals to the
rear of the board. Any components and connectors mounted on the rear transition card
can be mounted from the opposite side of the backplane to a front panel card. Rear
panel cPCI boards come in two flavors in terms of physical dimensions. Their standard
width is 80mm and their length can be either 100mm or 160mm. The narrow standard
width facilitated the construction of smaller cPCI systems and consequently smaller
enclosures, saving space for telecommunications equipment used in central offices.

system slot peripheral slots

P5 P5

double-headed connectors
for rear panel I/O (optional) P4 P4

P3 P3

P2 P2

P1 P1

Figure C-2 cPCI backplane with 7 peripheral slots

The backplane usually supports all 5 cPCI connectors on the front side. On the back
side, there is only support for the J3, J4 and J5 connectors, as they are the ones utilized
by rear panel boards. Recall that the J1 and J2 connectors are used to implement the
cPCI bus interface; they are not mirrored to the rear of the backplane due to physical line
limitations. The J1-J5 corresponding backplane socket connectors are referred to as P1-
P5 (Figure C-2).

The flexibility offered by the cPCI specification has lead vendors to the implementation of
backplanes “optimized” for the target application. Choosing the right backplane for a
cPCI system clearly depends on the functionality the system should provide. The board
size, the bus throughput demands, the number of expansion slots and the presence of
specific sub-buses are issues to consider. Backplanes can be either 3U or 6U size. The
cPCI bus speed can be 33MHz or 66Mhz, whereas the number of expansion slots is 8
for a 33MHz bus and 5 for a 66MHz bus. Nevertheless, the number of slots can be
expanded by means of PCI-to-PCI bridges. Additionally, a backplane may have a
Computer Telephony bus interface on its P4 connector, or may provide the connector for

- 129 -
Institute for Microelectronics and
UMTS Gi Iinterfacing and Measuring System
Information Technology

user-defined I/O. Board slots on the backplane are designated as 1,2,3 through n, where
n is the total number of slots. In current commercial cPCI products, this number varies
between 2 and 16. Slot numbering starts at the left, as the backplane is viewed from the
front. The center-to-center spacing of the slots is 20.32 mm (0.8 inch). One slot is always
occupied by the Single Board Computer and is therefore called the system slot, whereas
the rest of the slots are used to mount expansion boards and are called peripheral slots.

C.4.3 Single Board Computer


A Single Board Computer (SBC) is a complete computer built on a single circuit board
[45]. Its functionality is similar to the one provided by a motherboard in a typical desktop
PC. The basic unit of an SBC is a microprocessor, surrounded by volatile and non-
volatile storage as well as a number of I/O interfaces.

In the context of cPCI, the SBC is the brain of the cPCI system. SBC responsibilities
include. but are not limited to, management and control operations. The SBC runs the
operating system and application-level software, commonly used to provide
configuration and initialization data to cPCI peripheral boards. Typical components of a
cPCI SBC are one or more microprocessors - referred to as host processor(s), static and
dynamic RAMs, flash memory, graphics controllers, Ethernet controllers and ports,
Universal Serial Bus and serial ports, as well as keyboard and mouse ports; more or less
the same circuitry a desktop PC motherboard includes in a cPCI form factor. SBCs also
provide IDE interfaces to connect hard drives or access non-volatile storage resident in
the SBC as if it were a hard drive, as well as slots for the addition of expansion boards
on the SBC itself. The expansion slots usually accept PCI Mezzanine Cards (PMCs).
PMCs are electrically compatible with the PCI bus but offer a smaller and more robust
package25. PMCs are attached to the SBC primary PCI bus. Their size allows them to fit
between the SBC and any adjacent cards on the cPCI backplane.

One of the most critical functions a SBC provides is cPCI bus arbitration by means of a
cPCI bus bridge. The cPCI architecture is a centralized architecture, requiring the SBC
to control access to the cPCI bus. The actual bus arbitration mechanism is described in
the “cPCI Bus” subsection of this chapter.

SBCs use standard desktop PC chips enabling vendors to provide a wide range of
products, mainly differentiated by their incorporating processor and amount of memory.
The varying types of SBCs aim at offering different features/cost combinations to system
designers so that they can select the SBC suiting the particular needs of their cPCI
system. As a result of that, there are SBCs that utilize Intel Pentium, Intel Celeron,
PowerPC, or even Intel Pentium or Celeron mobile processors. Intel Pentium offer
higher processing capabilities when compared to Intel Celeron processors, whereas
mobile processors offer very low power dissipation and are best suited to mobile
applications. Similarly, the amount of volatile storage that can be present on an SBC
varies from 512MB to few gigabytes.

25
The standard dimensions for a PCI Mezzanine Card printed circuit board are 74mm x 149mm

- 130 -
Dimitris Tsaimos Msc Thesis Royal Institute of Technology

C.4.4 Peripheral Boards


cPCI peripheral boards are mounted on the backplane peripheral slots. Theoretically,
there can be

[(n + 1) x 8] - 1

peripheral boards in a cPCI system when using n PCI-to-PCI bridges - n can be any
number starting from 0, whereas the -1 quantity represents the slot occupied by the
SBC. However, it is common practice to include 7 or 15 slots when using one PCI-to-PCI
bridge.

Peripheral boards can be application-specific, as in our case is the test tool, or provide
functionality oriented to a particular class of applications. The latter group of boards
usually deploys a general-purpose processor surrounded by memory resources, I/O
resources and hardware accelerators specific to the particular class of applications.
Examples of such accelerators are graphics accelerators to facilitate image and video
processing applications as well as security coprocessors that handle cryptographic
functions. To add an extra level of flexibility, this category of boards supplies slots for
PCI Mezzanine Cards. PMCs come as off-the-shelf components and provide an easy
way to expand/customize the functionality of the peripheral board. The common case is
that up to two such cards can be plugged in the board PCI bus.

Peripheral boards are attached to the system-wide cPCI bus through their J1/J2
connectors and can optionally include additional connectors to communicate with rear-
transition modules. Each peripheral board must include a PCI bridge which is delegated
PCI related tasks such as bus arbitration and data buffering during PCI transfers.

C.5 cPCI bus


The cPCI bus sits at the heart of every cPCI system (Figure C-3). As mentioned earlier,
it is a reengineering of the PCI bus, tailored to the needs of industrial and embedded
computing. The cPCI bus operates at 33MHz or 66MHz and its width can be either 32 or
64 bits. As it is a resource common to all system boards a centralized arbitration scheme
is defined to regulate accesses, according to which bus arbitration is performed by the
SBC. The scheme utilizes REQ# and GRANT# signals. Each peripheral board has a
REQ# and a GRANT# signal, which are uniquely identified by a number between 0 and
n - n is the total number of peripheral slots in the backplane. A board connected to
peripheral slot k owns REQk and GRANTk. All peripheral boards REQ# and GRANT#
signals are connected to the SBC cPCI bus bridge. When a card connected to peripheral
slot k wishes to use the cPCI bus it asserts its REQk signal to request the bus from the
SBC. The SBC waits for any current transactions on the cPCI bus to complete and then
asserts GRANTk to indicate to the peripheral card it now owns the bus. In case more
than one boards request the cPCI bus, a priority algorithm decides the one to be the
next bus master.

- 131 -
Institute for Microelectronics and
UMTS Gi Iinterfacing and Measuring System
Information Technology

local bus
microprocessor

PCI I/O
memory
bridge controllers
SBC

CompactPCI bus

peripheral peripheral
board . . . . . . . . board
(slot 1) (slot 7)

Figure C-3 cPCI bus architecture

Ownership of the cPCI bus can be claimed by any board; as soon as it is granted
exclusive access, the board can initiate a direct transfer to any other board. This feature
constitutes the cPCI bus as a very good choice for realizing distributed multiprocessing
systems. Processing-intensive tasks can be delegated to individual cards, under the
supervision of the SBC. The 32-bit wide cPCI bus signals are implemented on the J1
connector, whereas the 64-bit bus requires the J2 connector to be present. All Single
Board Computers must include the J1 and J2 connectors; meanly support both the 32
and 64-bit versions. Similarly, all 64-bit wide boards must also include both connectors.
32-bit peripherals must include the J1 connector and may include J2 for user-defined
I/O. Last but not least, we should mention that not all J2 connector pins are used for
cPCI bus signals. Few are reserved and few can be utilized for user-defined I/O. For a
detailed description of the J1 and J2 connector pins, refer to [46].

C.6 Hot Swap


Hot Swap is the ability to insert and remove boards from the system without disturbing
and/or interrupting its operation. It is the key mechanism of cPCI systems high
availability, due to the fact that they enable the operator to remove and re-insert a board
without having to take the whole system off-line. Considering the fact that the cPCI
standard targets industrial computers, including live board insertion and extraction was
one of the primary concerns for the PICMG .

Hot Swap circuitry is located on the cPCI boards, so as to maintain the passive
backplane of the cPCI and facilitate system maintenance for administrators. As
mentioned earlier, for industrial applications it is preferred to have a simple and robust
backplane less likely to fail than a backplane incorporating bus driving circuitry. In the
case an active backplane fails, the whole system has to be taken off-line to extract the

- 132 -
Dimitris Tsaimos Msc Thesis Royal Institute of Technology

backplane. Additionally, such a backplane would constitute a single point of failure for
the system; in the case it failed the whole cPCI computer would fail. On the other hand,
an active backplane approach would lead to universality in cPCI boards, as there would
be no hot-swap and non-hot-swap boards. The boards would just plug-in in the active
backplane, which would be either hot-swap or non-hot swap depending on the target
application needs. However, this is of little importance as the target applications of cPCI
require hot swap support.

The actual benefits of Hot Swap can be understood by presenting what happens to a
system when a board is inserted or removed. The system is in full operating mode and
its backplane is fully powered. This means that all of the backplanes capacitors are fully
charged. On the contrary, newly inserted card capacitors are discharged. As a result,
when the card is inserted the board capacitors draw a large amount of current in a very
short period of time in order to charge. The system power supplies can not instantly
change their output current to provide all current necessary, while maintaining their DC
voltage within required limits. The results of this phenomenon are a drop of voltage on
the backplane - the backplane capacitors are discharged to provide current to the board
capacitors - and the creation of glitches on the backplane supply voltage. The system
becomes unstable, the board connector pins may be damaged, chips could be reset or
even permanently damaged [47]. Similarly, when the board is extracted form the system
its bypass capacitors are fully charged. At the time of extraction they must be smoothly
discharged.

Furthermore, the insertion or extraction of the board should not have an impact on
ongoing bus transactions. During insertion/extraction bus connector pins bounce before
a stable connection is made or broken and their capacitance is charged/discharged
leading in glitches on the bus that can interfere with ongoing transactions. Therefore the
corresponding bus signal lines must be gracefully precharged/discharged to ensure that
current bus transactions are not interrupted.

Hot Swap circuitry is responsible for controlling the power up of uncharged boards and
managing system response. We can identify three basic parts in a hot-swap enabled
system

• the physical part, that is the way that a board is mechanically inserted/extracted
to/from the system,

• the electrical part which handles connecting/disconnecting the board to/from the
backplane,

• the software part responsible for “attaching” the board functionality to the system.

The Hot Swap specification in its effort to allow for flexibility in terms of features and
costs related to Hot Swap, has defined three levels of increasing capability and
complexity, each one being an extension to the previous; basic Hot Swap, full Hot Swap
and high availability [48]. Each level addresses the aforementioned basic parts of a Hot
Swap system in a complementary way, in order to add hot swap features.

The basic Hot Swap architecture is the cornerstone model, having both next levels
building on top of it. It describes features needed to unplug and plug in a board without

- 133 -
Institute for Microelectronics and
UMTS Gi Iinterfacing and Measuring System
Information Technology

disturbing bus activity, as well as power-up and power-down sequence26. The software
initialization sequence is manual. The network operator is responsible for configuring the
system to utilize the board services and loading any necessary drivers. Possible tasks of
the power control may be assuring a minimum delay time between a primary supply
achieving a steady state level and a secondary supply being turned on. Alternatively, the
power control logic may turn the second supply on after the primary has reached a fixed
threshold. The requirements for basic Hot Swap capability are staged pins, the ability to
electrically isolate the PCI bus circuitry on a board and power controlling circuitry. The
staged pins are located on the cPCI J1 connector and include 3 pin lengths

• long, for power and ground,

• medium, for PCI bus signals,

• short, which is a single pin - BD_SEL# - used to inform the system that the board
is inserted, the bus connect and software initialization can begin.

Full Hot Swap extends the functionality of basic Hot Swap including three additional
features. To begin with, it defines an extra signal - ENUM# - that is connected on a
common backplane ENUM bus and is used to inform the operating system that a board
insertion/extraction will follow. Furthermore, a microswitch is attached to the board
injector/ejector and is used to assert the ENUM# signal when the injector/ejector is
activated. Finally, there is a LED indicating to the operator that the operating system has
successfully uninitialized the board and therefore it can be safely extracted. Full Hot
Swap enables operators to add/remove boards without reconfiguring the system
manually. The operating system can autonomously identify and utilize additional boards.

The high availability Hot Swap scheme enables dynamic hardware reconfiguration on
top of software reconfiguration. In a high availability system, the software can control a
board’s state. As a result of that two identical boards may be present in the same
system, with one card being active and the other acting as back-up, activated in the case
the primary board fails. The high availability scenario requires that special Hot Swap
controllers may be present.

A typical high availability Hot Swap board power-up sequence would be as follows [49].
The operator inserts the card into the backplane. The long pins - power and ground -
make contact first and provide early power to the board. The early power is solely used
to charge the board bypass capacitors and provide power-up sequencing. At this stage,
the LED turns on and all circuitry except for the controller is in reset condition.
Additionally the PCI bus signal lines are precharged. Next, the medium PCI bus pins
make contact and finally the BD_SEL# pin is engaged. BD_SEL# turns on the powering
of all board circuitry. Once the circuitry has been properly powered, the Hot Swap
controller informs the CPU that the board is ready for operation using the HEALTHY#
signal, the board is released from the internal reset state and follows the reset signal
provided by the CPU. If the operating system wishes to utilize the board it releases the
PCI bus reset and the board is embodied to the system. Otherwise, the board remains
powered, but in reset state. The LED is turned off to indicate to the operator that the

26
Powering sequence refers to the control of relative levels and timing between two or more supply voltages
during power-up and power-down transitions.

- 134 -
Dimitris Tsaimos Msc Thesis Royal Institute of Technology

connection process is complete. The operator then activates the injectors of the board
and the microswitch drives the ENUM# signal, to inform the CPU that the board is
operational.

A typical power-down sequence would begin with the ejector activation. The microswitch
would then drive the ENUM# signal to inform the CPU that the board is about to be
extracted. The LED is turned on to indicate to the operator that the detach process is in
progress. The short pin detaches first and the board back-end circuitry enters a reset
state, whereas the PCI bus signals are precharged. Furthermore, the HEALTHY# signal
is deasserted to indicate that the board is not functional and the operating system
logically disconnects it. Then the medium and long pins detach and the LED turns off for
the operator to know that the detachment process is complete.

Last, let us mention that the Hot Swap scheme incorporates two special purpose
registers, the HS CSR (Hot Swap Configuration and Status Register) and the Extended
Capability Pointer (ECP). The HS CSR contains two bits, the INS and EXT bit which are
used by the CPU along with ENUM# to decide whether a board is inserted or extracted.
Additionally, they are used to drive the LED indicating to the operator when the board
insertion/extraction process is complete. The ECP register provides a mechanism to the
system software to determine if the Hot Swap circuitry includes extended capabilities.

C.7 cPCI Benefits Summary


Throughout this chapter we presented the major points of the cPCI specification and the
benefits that source out of the elements it incorporates. To summarize, the cPCI major
features are

• a high bandwidth board interconnection scheme, based on a re-engineering of


the PCI bus,

• enhanced board I/O capabilities by means of the J1-J5 connectors and the large
number of user-defined connector pins,

• ease of maintenance and configuration utilizing rear-panel I/O and the lack of
board cabling,

• software mechanisms that enable the automatic detection and configuration of a


cPCI card by the host computer,

• increased mean time between failures and system high availability, due to the
Hot Swap specification,

• specially designed mechanical characteristics, which provide tolerance to


industrial environment harsh operating conditions.

However, we should mention that the centralized arbitration scheme of cPCI is not
suitable to multiprocessing systems, where a particular task is executed simultaneously
by two or more boards. cPCI enables multiprocessing in the form where a particular task
execution is entirely delegated to a specific card. Nevertheless, cPCI still incorporates a

- 135 -
Institute for Microelectronics and
UMTS Gi Iinterfacing and Measuring System
Information Technology

wide variety of features needed by high-performance industrial, telecommunications and


networking equipment. As the Keletron test tool addresses mobile telecommunications
networks equipment manufacturers and operators, it was decided that the system
designed would be based on the cPCI specification. Furthermore, a cPCI-based system
would ensure the flexibility necessary to accommodate the variety of hardware
characteristics needed by the test tool.

- 136 -
Dimitris Tsaimos Msc Thesis Royal Institute of Technology

References
[1] http://www.3gpp.org/

[2] “UMTS Protocols and Protocols Testing”, International Engineering


Consortium Web ProForum Tutorials,
http://www.iec.org/online/tutorials/umts/

[3] "3GPP TS 23.060 v6.6.0, General Packet Radio Service (GPRS); Service
description; Stage 2 (Release 6)”,
http://www.3gpp.org/ftp/Specs/html-info/23060.htm

[4] “3rd Generation Partnership Project; Technical Specification Group Core


Network; General Packet Radio Service (GPRS); GPRS Tunneling Protocol
(GTP) across the Gn and Gp interface (Release 6)”

[5] ” GSM Phase 2+ General Packet Radio Service GPRS: Architecture, Protocols,
and Air Interface”, Christian Bettstetter, Hans-Jorg Vogel and Jorg
Eberspacher, Technische Universitat Munchen (TUM), IEEE Communication
Surveys, Third Quarter 1999, vol. 2 no. 3

[6] "3GPP TS 29.061 v6.2.0, Interworking between the Public Land Mobile
Network (PLMN) supporting packet based services and Packet Data Networks
(PDN), (Release 6)”,
http://www.3gpp.org/ftp/Specs/html-info/23002.htm

[7] “The Point-to-Point Protocol ”, editor W. Simpson, RFC1661, July 1994

[8] “Layer 2 Tunneling Protocol - Version 3 (L2TPv3)”, editors J. Lau, M. Townsley,


Cisco Systems, I. Goyret, Lucent Technologies, RFC3931, March 2005

[9] “Ethernet Technologies”, Internetworking Technologies Handbook, Chapter 7, Cisco


Systems Inc,
http://www.cisco.com/univercd/cc/td/doc/cisintwk/ito_doc/ethernet.htm

[10] http://www.trendcomms.com/multimedia/training/broadband%20networks/web/main/
Ethernet/Theme/Chapter1/EnetPhysicalMedia.html

[11] “IXF440 10/100 Mbps Ethernet Controller Datasheet”, Intel Corporation, September
2000

[12] William F. Bernd, Gerald A. Capehart, “Logarithmic Timer Management


Algorithm”, Motorola Technical Developments

[13] Silvio Dragone, Andreas Doring, Rainer Hagenau, “A Large-Scale Hardware


Timer Manager”, http://www.ece.northwestern.edu/EXTERNAL/anchor/
ANCHOR04/final_manuscripts/paper_8.pdf

- 137 -
Institute for Microelectronics and
UMTS Gi Iinterfacing and Measuring System
Information Technology

[14] Marco Heddes, “A Hardware/Software Codesign Strategy for the


Implementation of High-Speed Protocols”, PhD thesis, University of
Eidhoven, 1995

[15] Ranjita Bhagwan, Bill Lin, “Fast and Scalable Priority Queue Architecture for High-
Speed Network Switches”, Proceedings of the IEEE Infocom Conference, pp.
538-547, Tel-Aviv, Israel, March 2000.

[16] Aggelos Ioanou, Manolis Katevenis, “Pipelined Heap (Priority Queue)


Management for Advanced Scheduling in High-Speed Networks”,
archvlsi.ics.forth.gr/muqpro/pipeHeap_ioan_icc01.pdf

[17] Simon W. Moore, Brian T. Graham, “A Tagged up/down sorter – A hardware


priority queue”, The Computer Journal, Volume 38, Issue 9, pp. 695-703, 1995

[18] Stephan Olariu, M. Cristina Pinnot, Si Qing Zheng, “An Optimal Hardware
Algorithm for Sorting Using a Fixed-Size Parallel Sorting Device”,
IEEE Transactions on Computers, Vol. 49, No. 12, pp, 1310 - 1324, 2000.

[19] http://www.cprogramming.com/tutorial/computersciencetheory/heap.html

[20] “MCM69C232 datasheet”, Motorola Semiconductor Technical Data,


http://www.ortodoxism.ro/datasheets/motorola/MCM69C232.pdf

[21] http://www.music-ic.com/

[22] http://www.mosaid.com/

[23] Clive Maxfield, “The Design Warrior’s Guide to FPGAs, Devices, Tools and
Flows”, Newnes, ISBN 0750676043

[24] “Virtex-II Platform FPGAs: Complete Data Sheet”, Product Specification,


Xilinx, March 1, 2005

[25] “Internet Protocol”, DARPA Internet Program, Protocol Specification, RFC791,


September 1981

[26] “User Datagram protocol”, editor J. Postel, RFC768, August 1980

[27] Andrew Rushton, “VHDL for logic synthesis”, 2nd Edition, John
Wiley & Sons, ISBN 0-471-98325-X

[28] mufasa.informatik.uni-mannheim.de/lsra/lectures/ws98_99/vl_simu/vorlesung/
vl_hw_designflow.pdf

[29] “ISE 7 In-Depth Tutorial”, Xilinx Inc, 2005

[30] “Development System Reference Guide”, Xilinx Inc

- 138 -
Dimitris Tsaimos Msc Thesis Royal Institute of Technology

[31] http://www.ethereal.com/

[32] 3GPP TS 23.107 v6.3.0 (2005-06), “3rd Generation Partnership Project; Technical
Specification Group and System Aspects; Quality of Service (QoS) concept and
architecture (Release 6)”, Technical Specification

[33] 3GPP TS 22.105 v6.4.0 (2005-09), “3rd Generation Partnership


Project; Technical Specification Group and System Aspects Service aspects;
Services and service capabilities (Release 6)”

[34] 3GPP TS 26.236 v6.4.0 (2005-09), “3rd Generation Partnership Project; Technical
Specification Group and System Aspects; Packet switched conversational
multimedia applications; Transport protocols (Release 6)”

[35] 3GPP TR 26.937 v6.0.0 (2004-03), “3rd Generation Partnership Project; Technical
Specification Group and System Aspects; Transparent end-to-end packet switched
streaming service (PSS); RTP usage model (Release 6)”

[36] “Interconnections Second Edition, Bridges, Routers, Switches and


Internetworking Protocols”, Radia Perlman, Addison-Wesley,
ISBN 0-201- 63448-1

[37] “Asynchronous Transfer Mode Switching”, Internetworking


Technologies Handbook, Chapter 27, Cisco Systems Inc
http://www.cisco.com/univercd/cc/td/doc/cisintwk/ito_doc/atm.htm

[38] “ATM networks”, Othmar Kayas, Gregan Crawford, Prentice Hall PTR,
ISBN 0130936014

[39] “3GPP TS23.002 v6.5.0 (2004-06), 3rd Generation Partnership Project; Technical
Specification Group Services and System Aspects; Network Architecture
(Release 6)”,
http://www.3gpp.org/ftp/Specs/html-info/23002.htm

[40] http://www.picmg.org/

[41] “Teaming up with an enclosure manufacturer can aid your design process”, Ryan J.
Maley, Equipto Electronics Corp,
http://www.compactpci-systems.com/articles/id/?187

[42] “Enclosure Design Guidance Planning for NEBS Compliance”,


Michael R. Palis, Hybricon Corp., July 25 2002,
http://www.hybricon.com/products/engineering/techfocus/NEBS_Paper.pdf

[43] http://en.wikipedia.org/wiki/Backplane

[44] “The CompactPCI Report: Rear Panel I/O”, Eike Waltz, Joe Pavlat, CompactPCI
systems, Fall 1998/1,
http://www.compactpci-systems.com/columns/CompactPCI_Report/pdfs/fall.1998.pdf

- 139 -
Institute for Microelectronics and
UMTS Gi Iinterfacing and Measuring System
Information Technology

[45] http://en.wikipedia.org/wiki/Single_board_computer

[46] “CompactPCI Short Form Specification”, PICMG, November 1, 1995

[47] “Introduction to Hot Swap", Jonathan M. BearField, Texas Instruments,


http://www.techonline.com/community/related_content/14664

[48] “CompactPCI Hot Swap Architectures”, Joe Pavlat, November 11 2005,


http://www.commsdesign.com/main/9810/9810feat4.htm

[49] “CompactPCI Hot Swap Controllers with Bus Precharge, On-


Chip intercept of PCI Reset Signal and Much More”, Andrew Gardner, Linear
Technologies

- 140 -

S-ar putea să vă placă și