Sunteți pe pagina 1din 8

Sai Akshit Kumar Gampa

1207640404

VLSI Design

Siva Karthik Uddandam


1207864459

LAB 4: MS2
Introduction:
On chip routers are on high market demand as all the systems like mobiles, computers, tablets and smart
watches are having multi cores to improve efficiency and the performance of the system. Multi cores need
to share data between them so there is a need to communicate between different processors. The data
packets from every cores should be able to reach any other core of the system. To do this we use on chip
routers. On chip routers help transferring data packets to and from various processors on the systems.
This Lab projects aims on building a 4X4 router that can connect 16 processors which is great.

Router 1X1:
Idea behind 1X1 router:
The 1X1 router is a simple device that needs to take in data from its local processor or the neighboring
core and should push in appropriate direction to make the data reach its destination. To do this we need
few components like memory to store incoming bursts of data requesting same output direction,
arbitrator that routes the data accordingly and also takes care on the priority calculation for the output
direction based on input direction. The 1X1 router also does handshaking before passing the valid data to
the next neighboring router.

1X1 router RTL design:


Arbitrator:
The routing logic is implemented through an arbitrator module. This makes a decision on how to route an
incoming packet based on the destination address. The arbitrator first checks if packet needs to push
along the X axis if so then it does that. If the packet need not to push along X axis then it pushes along Y
axis. If the packet reaches the destination router then it is pushed into the local port. The router arbitrator
has the following precedence order on the input ports N>E>S>W>L.
FIFO module:
The router was designed using system Verilog HDL. The router is composed of 2 sub modules and a top
module. The design code first builds a synchronous FIFO that always has a data on its output line and pops
out the next value on the data line only if pop signal is asserted. The FIFOs are interfaced to the arbitrator.
The FIFO sends a data valid signal when it is not empty. The FIFO sends a ready signal to take incoming
data when it is not full. The popped data is the input to the current arbitrator. And the data push port is
output data from the neighboring router. With these FIFOs the congestion between packets were
decreased to a large extent. This finally gave us a no to arbitrator to a destructive Router.
1x1 Router
The 1X1 router top module is composed of one arbitrator and 5 FIFOs. The FIFOs and arbitrator
communicate using the signals as follows:
Data_pop = data_in(to arbitrator)
Data_push= routertobus _data(from neighbouring router)

Sai Akshit Kumar Gampa


VLSI Design
1207640404
Full=routertobus_ready(to neighbouring router)
Empty=data_valid(to arbitrator).
Push=bustorouter_valid (from neighbouring router)
Pop=data_pop(from arbitrator)

Siva Karthik Uddandam


1207864459

Figure 1. 1X1 router internal composition


The major problems face while designing RTL coding is:
1) To design the arbitration and to stop sending those packets with less priority with requesting same
output port.
2) To avoid packets getting destroyed when there is congestion.
3) To have proper communication between FIFO and Arbitrator.
4) Timing and synchronization during packet arrival and packet departure.

Sai Akshit Kumar Gampa


1207640404

VLSI Design

Siva Karthik Uddandam


1207864459

Optimization summary:
The 1X1 router is the fundamental building unit of the on chip router. The instance of 1X1 is used multiple
times hence its optimization is much required. To make the on chip router more efficient we have
optimized its design. We have made the XY algorithm routing logic such that it first tries to move along X
direction. If the port along that direction is busy then the packet is routed in the other direction if routing
in that direction is needed.
The whole router was optimized to give good performance and speed. The delay used was only 2000ps
and all the data arrives within this delay without slacks and delays. Also the main aim was to use less
number of cells therefore care was taken in rtl coding to minimize the number of cell used by avoiding
redundant logics blocks.

Router 4X4:
The optimized 1 X 1 router designed was used to build the required 4X4 router to connect 16 processors.
There are different architectures to build a 4X4 router like crossbar, butterfly and torus so on. This project
develops a 4X4 router using crossbar implementation. The advantage of cross bar is it very fast and routes
any data packet quickly. The simple diagram of implementation is shown below:

Figure 2. 4X4 router connection using 1X1 routers


We can see that the edges of the outer edges are connected to each other. In this way we can improve
the performance i.e. faster packet routing to extreme edges. The following is a hand drawn image showing
our design and corresponding signals together.

Sai Akshit Kumar Gampa


1207640404

VLSI Design

Siva Karthik Uddandam


1207864459

Figure 3. 4X4 router design with signal names

Optimization summary:
The optimization is done in way that with minimum of routers 16 processors can be connected. The way
in which we implemented the design shows that it consumes less power than other implementation. It is
proved in literature and even verified from our project. See results section for power and area details.

Sai Akshit Kumar Gampa


1207640404

VLSI Design

Siva Karthik Uddandam


1207864459

Automatic Place and Route:


The HDL code was synthesized and then the generated net list and other files were loaded into the
encounter. The floor planning, power planning was done with optimal values. Later on, the pins were
properly placed to avoid congestion in design and then well taps and blockages were added. After that
the standard cells were placed with a density of 0.3. Pre CTS was done without setup violations. CTS was
done with given clocks. Post CTS was performed and there were few hold violations. These violations were
removed by inserting Buffers. Nanoroute was performed till minimum DRC violations acquired. After that
fillers were added and exported to cadence Virtuoso.
The problems faced were:
1) Hold time violations were more so many iterations were done to zero them.
2) Nanoroute had many DRC violations so a large number of iterations were performed to minimize them.

Layout Generation:
After importing to cadence a large number of recurring DRC errors were there. They were fixed. After
DRC, LVS was giving many mismatched cells. This may be due to bad cells. These can be avoided by placing
good cells.

Prime time analysis:


Power and timing was analyzed using pt_shell. It was observed that the power was 9.471mW. The timing
reports also showed that there were no setup and Hold violations.

Optimization assumptions V/S Optimized Results:


Before we built our design we estimated that if we optimize the 1x1 design it will boost up the
performance. We used XY algorithm to route the packets efficiently. But after designing we came to know
that this method works very well when the packets to be routed evenly to all processors. This method
would be fatal if all packets from different cores want to reach only a specific processor. Anyways this
case of specific router is rare. From the literature we found that power savings are more if crossbar ports
are more. See graph below. But we found that in the current processor design we are unable to increase
crossbar ports due to pre-defined design specs.
The 4X4 router was developed in a cross bar matrix pattern assuming that it would be fast in limited area.
But after designed we got to know the area could be more optimized by coding RTL in a more optimized
way so that synthesis would use less number of cells.

Sai Akshit Kumar Gampa


1207640404

VLSI Design

Siva Karthik Uddandam


1207864459

Figure 4. Segmented crossbar power saving (relative to matrix crossbar)

Results:
This section shows simulation waveforms where you can see that the RTL design is up to design
specifications. You will also observe how the RTL design, synthesized design and post encounter design
produce same output results.
We can also see here the working of 4X4 router. In 4 X4 router data is generated by local processor and is
sent to designation core. This routing is seen from local in port of sour router and local out port of
destination router.

Sai Akshit Kumar Gampa


1207640404

VLSI Design

Figure 5. 1X1 RTL design waveform

Figure 6. 1X1 synthesized waveform

Siva Karthik Uddandam


1207864459

Sai Akshit Kumar Gampa


1207640404

VLSI Design

Figure 7. 1X1 post encounter waveform

Figure 8. 4X4 router waveform

Siva Karthik Uddandam


1207864459

S-ar putea să vă placă și