Sunteți pe pagina 1din 36

DD2490 p4 2010

Routing architecture and forwarding

(Intro to Homework 4) Olof Hagsand KTH /CSC

Connecting devices
Connecting devices

Networking devices

Internetworking devices

Hub/ Repeater L1

Bridge/ Switch L2

Router L3

Application gateway L4-L7

IEEE 802 vs IPv4 addresses


vendor code vendor assigned

IEEE 802

10111101 10111101

10111101 01110101

10111101 10111101

10111101 01110101

10111101 10111101

10111101 01110101

Group/ Individual bit Global/ Localbit

00:0E:35:64:E9:E 7

netid

hostid

IPv4 addr

10111101 11000000

10111101 00100100

10111101 01111101

10111101 00010010

192.36.125.18

Routing vs bridging
Bridging - forwarding on layer 2
A MAC address/ID has a flat structure
many nodes -> large forwarding tables broadcast reaches all nodes

Simple to configure and manage, cheaper Loops detected by spanning tree protocol

Routing forwarding on layer 3


The netid of the IP addresses can be aggregated
many nodes -> smaller forwarding tables than bridging routers partition broadcast domains

Routing is more difficult to configure Loops detected by routing protocols and TTL decrementation
4

What does a router do?

Packet forwarding Not only IPv4: IPv6, MPLS, Bridging/VLAN, Tunneling,... Filter packets Metering/Shaping/Policing Compute routes: build forwarding table In the background: routing In real-time: forwarding

Access lists

Classifier

Lookup

Metering

Shaping

Router Components
CPU module Control Processor Routing Engine CPU Routing Table Memory Execute routing protocols, compute routing table, configure line cards...

Output buffering, waiting for transmission...

Input buffering, waiting for access to output port... Line cards Examine headers, Memory routing decision... Packet Processing MAC External links

Interconnect QoS scheduling...

Memory Packet Processing MAC

Memory Packet Processing MAC

Fast path, slow path


Control Processor CPU Routing Table Slow path Memory

Line Card

Line Card

Fast path

Line Card

Line Card

Fast path If line cards can determine outgoing port Slow path Control processor must determine outgoing port
7

Inside a router, 1st Generation


CPU RIB Buffer Memory

Shared bus backplane

Line Card

Line Card

Line Card

Every packet goes twice over the shared bus Constrained by Bus and memory bandwidth (per byte cost) And CPU cycles (per packet cost)
8

Inside a hardware-based router


Switched backplane

Line Card Buffer Memory forwarder

Line Card CPU Card Buffer Memory forwarder RIB CPU

Line Card Buffer Memory forwarder

Line Card Buffer Memory forwarder

Multiple simultaneous transfers over the backplane Specialized hardware: ASICs (Application Specific IC) Wirespeed at 100 Gb/s and beyond
9

Crossbar Architecture
controller input ports switching fabric 1

Space division approach Switched interconnection between input and output Centralized controller
coordinates input-output ports activates paths between ports

2 . . . N

Multiple transfers can proceed simultaneously Crossbar is non-blocking

interface logic output ports 1 2 M


10

. . .

Shared Bus Architecture


shared bus

. . .

input ports

output ports

Relies on time division internal data path is shared Address, control, and data lines and a bus protocol Granularity
Packet granularity: simple, but may result in delay problems Block granularity: more overhead, but avoids long delays

. . .

11

Routing table lookup


Longest prefix first Divide table in 32 buckets - one for each netmask length Match destination with longest prefixes first SW algorithms: tree, binary trees, tries (different data structures) HW support: TCAMs Content Addressable Memory
Masklen 0 1

Netid Netid

31 32

destination IP address
12

...

Using a Trie for lookup


Binary tree
Nodes are prefixes Left branch represents 0in the string Right branch represents 1
* a a * b 10* c 01* d 110* e 0010 f 0110 0* 1* g 0111

00*

01*

10*

11*

000*

011*

110*

0010

0110

0111
13

Elimination of Internal Prefixes


No overlapping prefixes Prefix expansion with leaf pushing Simplifies lookup at expense of larger memory
a * b 10* c 01* d 110* * e 0010 f 0110 g 0111

00*

01*

10*

11*

a
14

Linear Search on ValuesTCAM


Ternary Content-Addressable Memory
Fully associative memory

Compare input with all words in parallel


First match gives the result

Three values for each bit0, 1, and x (dont care) input


= = = = = = = TCAM

Up to 100 million searches per second

0010 0110 0111 110x 01xx 10xx xxxx

g f e d c b a

a * b 10* c 01* d 110* e 0010 f 0110 g 0111


15

TCAM layout
Route lookup in one memory access Prefixes ordered by length First match first Contents need to be sorted
24-bit prefixes 32-bit prefixes 31-bit prefixes

8-bit prefixes

16

Packet classification
Map a packet to a class Class defined by filters, usually a 5-tuple:
<source IP, destination IP, source port, destination port, protocol>

For example, all packets:


From subnet N To TCP port 80 on web-server S From subnet N to port 666 on subnet M

Applications:
Firewall & NAT Blocking Accounting Policy routing QoSmetering, policing, DiffServ marking, ...

17

Cisco 12816
Port density examples 30xOC-192 (10 Gb/s) ports 120xOC-48 (2.5 Gb/s) ports 15x10 Gigabit Ethernet ports 60x1 Gigabit Ethernet ports
Capacity: 1.28 Tb/s Power: 4.7 kW 19

6ft

2ft

18

Cisco CRS-1
CISCO's current flagship: Carrier- Routing System 3-stage multi-stage switching plane >50% of cost Trie prefix lookup 7.5kW Each slot has 40Gbps 32Tbps raw bandwidth Distributed RP Several Logical Routers Optical_Electric transitions: O-E-O-E-O-E-O

19

Juniper Routers
M-series Shipping started 1998 M5, M10, M20, M40e, M160, M320 8xOC-192 or 32xOC-48 ports in a M160 T-series Shipping started 2002 T320, T640 32xOC-192 or 128xOC-48 ports in a T640
2.5ft Juniper M160
20

Capacity: 80Gb/s Power: 2.6kW 19

3ft

21

Juniper J-series
J-series Routers used in labs Emulates M/T series Full routing software

22

Open source routing


Linux, BSD platforms Most routing protocols exist as open source projects (eg Quagga) But PC hw has traditionally been a limiting factor But now up to 2x12 core CPUs, inter-processor buses (HT, QPI), non-uniform memory (numa),multiple buses (PCI-E), 10Gbps NICs enables 10s of gigabit forwarding speeds. Example: the Bifrost open source router (UU/KTH)

23

Example: PC routing architecture


DDR3 DDR3 DDR3 10Gb/s 10Gb/s 10Gb/s 10Gb/s 10Gb/s 10Gb/s 10Gb/s 10Gb/s CPU CPU CPU CPU 0 1 2 3 QPI PCI-E x16 x4 x16 CPU CPU CPU CPU 4 5 6 7 DDR3 DDR3 DDR3 10Gb/s 10Gb/s 10Gb/s 10Gb/s 10Gb/s 10Gb/s 10Gb/s 10Gb/s

QPI

QPI PCI-E x16 x4 x16

I/O Handler (North Bridge)

I/O Handler (North Bridge)

Multi-core CPUs: (Intel Nehalem) 8 cores, 16 with 'Hyper-threading' Multi-channel: each network card has 8 DMA queues NUMA: Non-local memory (many memory banks) Inter-processor bus: QPI 2.4GHz ~76 GB/s Memory: 1066 DDR3 68 GB/s x3 channels I/O Bus: PCI-E gen2 x1 ~4GB/s: x16 ~64GB/s
24

Homework 4
4a) Write a report on how forwarding works 4b) Make a programming assignment in C
Part 1: Print out IPv4 destination address Part 2: Make an IPv4 forwarding lookup

25

Homework 4a) Report


The assignment is a report about forwarding for students with little programming experience. The report should in a terse (not wordy) format, describe the forwarding performed by a router in the form of an algorithm description. That is, a specification for an implementation. The report should list the necessary steps a router performs to forward a packet from an input Ethernet interface card to an output Ethernet interface card. The following steps should be covered: MAC address lookup IPv4 and IPv6 forwarding Header sanity checks Header modifications Limited ICMP handling (at least one error case) L2 header decapsulation and encapsulation ARP lookup Statistics: Interface packet and error counters. Local delivery. The following steps need not be covered: Full ICMP handling Other protocols IP options Transport protocols/Socket handling
26

Homework 4b: Part 1


You should read an Ethernet frame, identify it as an IPv4 packet, and print the IPv4 destination address. Input: Ethernet packet. Example:
0200 0000 00110200 000c 0001 0800 4500 0026 17d4 0000 ff01 8ffc 0a01 0002 0a02 0002 0000 e802 c04b 0004 3e89 339a 0786 d0ff 0009

Output: IPv4 address. Example: 10.2.0.2 Errors:


Error: packet too short: length of frame in bytes Error: Not ipv4 payload: payload type

27

Homework 4b: Part 2


The program should read a forwarding table and an Ethernet packet and extract the destination IP address, make a lookup in the forwarding table, and write the outgoing interface name. The assignment is a step towards a full forwarding but lacks several sanity checks, MAC address lookups and ARP. It is intended to illustrate how to inspect packet header, the use of pointers, buffers, and IP longest prefix match. The program should do the following: Read a routing table from stdin. The routing table consists of a list of prefix, nexthop interface triples. Read a single Ethernet (RFC894) packet from stdin. Verify that the packet is long enough to contain an EThernet and IPv4 header Verify that the Ethernet payload type is 0x0800 (IPv4) Verify that the IP version field is 4 Extract the destination address from the IPv4 header and make a longest prefix match lookup and return the outgoing interface name. Example: Input Example Output: fib 10.1.0.0/24 e1 e2 fib 10.2.0.0/24 e2 fib 10.3.0.0/24 e3 fib 0.0.0.0/0 e1 input 0200 0000 0001 0200 0000 0011 0800 4500 0026 17d4 0000 ff01 8ffc 0a01 0002 0a02 0002 0000 e802 c04b 0004 3e89 339a 0786 d0ff 0009
28

Homework 4b: Kattis


If you have registered, you should get a Kattis account Use the link on the homework page and login Submit by selecting
language: C Select problem: forwarding (part 1), forwarding2 (part2) upload the file Submit

You can see the status on the web-page


Compile-error Runtime error Wrong output OK

You will also get a mail Submit solution electronically, or on paper lab assistants or course leader before the deadline. Append a receipt that you passed both forwarding and forwarding2 test of Kattis.
29

Extracting correct info


The ethernet header is 14 bytes
payload type is in bytes 13-14 IPv4 is 0x0800

The IP header is 20 bytes (without options)


The destination IP address is in bytes 17-20
struct ethhdr{ char }; struct iphdr{ unsigned int ip_v:4,ip_hl:4; uint8_t ip_tos; uint16_t ip_len; uint16_t ip_id; uint16_t ip_off; uint8_t uint8_t ip_ttl; ip_p; /* version, header length /* type of service */ /* total length */ /* identification */ /* fragment offset field */ /* time to live */ /* protocol */ /* checksum */ /* source and dest address */ da[6], sa[6]; uint16_t pt;

uint16_t ip_sum; uint32_t ip_src, ip_dst; };

30

Forwarding: Details
Ethernet decoding Check Ethernet header length Check ethernet destination address Dispatch on payload type for IPv4. IP header sanity checks IP header length checks (check buffer length vs hdr-len field vs total length field). IP packets containing IP options should be relayed without action. Check IP header version Check checksum Forwarding FIB lookup for outgoing interface and nexthop/ directed connected host TTL check, decrementation and checksum recalculation

31

Forwarding: Details (cont)


Encoding of Ethernet header Get the correct ethernet destination address Get the correct ethernet source address Transmission on outgoing interface Statistics A limited set of statistics can be gathered, as follows (these are a subset of the IP SNMP MIB): ipInReceives - total number of IP packets received. ipInHdrErrors - Packets with errors in IP header (length, checksum, version, etc). ipForwDatagrams - Number of successfully forwarded datagrams.

32

Byte ordering / Endianness


CPU:s represent numbers they load/store from memory differently
Most significant byte in first byte: Big-endian (Big end first) Most significant byte in last byte: Little-endian There is also middle-endian and bi-endian

Register 0A0B0C0D 0D 0C 0B 0A LittleEndian

Memory
n n+1 n+2 n+3

Register 0A0B0C0D

0A 0B 0C 0D BigEndian
33

Network byte order


The way the CPU stores/loads numbers from memory is called host byte order But in communication system, we sometimes have to transfer numbers in binary format (character arrays is never a problem) We have to agree on a format to encode numbers This is called network-byte order
In IP network-byte order is big-endian

Therefore, in portable code, if you transfer binary numbers between nodes, always translate between host-byte order and network-byte order. BSD has the following help functions:
htonl, ntohl (4-byte numbers) htons, ntohs (2-byte numbers)

34

Alignment
Datastructuresmustbealignedinmemorywhenaccessedas severalbytes. Inparticular2byte,4byte,8bytenumbersmustbealignedon wordboundaries
Otherwiseabuserroroccurs(inseriouscases,egSPARC) Oraperformancedegradation(asinx86)

Typically, 2bytenumbersmustbe2bytealigned 4bytenumbersmustbe4bytealigned Etc InEth+IP,theEthheaderis14byteswhichmakestheIPheader misaligned(actually,thefieldsoftheIPheader)

35

Alignment example
Memory 99 100 101 102 103 104 105 0A0B0C0D 0A 0B 0C 0D

OK

Memory 97 98 99 100 101 102 103 0A0B0C0D 0A 0B 0C 0D

BUS ERROR!

36

S-ar putea să vă placă și