0 evaluări0% au considerat acest document util (0 voturi)

430 vizualizări5 paginiDec 14, 2009

© Attribution Non-Commercial (BY-NC)

DOCX, PDF, TXT sau citiți online pe Scribd

Attribution Non-Commercial (BY-NC)

0 evaluări0% au considerat acest document util (0 voturi)

430 vizualizări5 paginiAttribution Non-Commercial (BY-NC)

Sunteți pe pagina 1din 5

Lab Report

Akshay Vijayashekar

akshay@student.chalmers.se

The main aim of the lab series is to understand and implement the stages in the physical

design flow of a chip from the basic RTL to GDSII stage. During this process, we learn

various software tools employed in Electronic Design Automation (EDA) to accomplish

the design of an IC. In the preparatory lab exercise we begin by drawing a block diagram

of a 32-bit ALU with different building blocks like the shifters, adder, muxes, registers,

etc. In the further exercises, we start by writing RTL code description for ALU using ripple

carry adder and sklansky adder. This design is tested against a given set of vectors and

further synthesized. We make a detailed analysis of area and power consumption for

different time constraints. Finally placement and routing of the design is done ending the

design flow process.

The block diagram done in the preparatory exercise is RTL coded using VHDL in this

exercise. We design two versions of RTL code for ALU, one based on ripple carry

adder(RCA) and another based on sklansky adder. We also write a test bench in VHDL to

logically verify the designed ALU. The VHDL description of ALU is based on the block

diagram submitted as a part of preparatory assignment. The test bench written is only

used for testing of the ALU code and hence need not be synthesizable i.e., need not be

realized in hardware. Such a code which cannot be realized in hardware is called non-

synthesizable code.

Next is the verification process which is the most important step. Here we compile and

test our VHDL code against a set of given vectors using ncsim compiler.

Results: The VHDL desciption of both ALU designs involving ripple carry adder and

skalnsky adder was successfully tested without any errors against all the test vectors

provided.

Conclusions: The block diagram which was drawn provides the designer with a proper

overview and understanding of what has to be designed. The VHDL code that is written

should be based on this overview block diagram. If this is done, we will end up writing a

good design.

Testing of the design is a crucial step in the design flow process. The bugs, if any in the

design, can be removed early in the process. If these bugs are found during the later

stages of the design flow the cost associated with the bug-fixes is much higher than the

time and cost invested in testing the design during the initial stages. Functional

verification is done to check if the desired output values are obtained for a given set of

inputs. This is accomplished by the use of test bench. The importance of test benches

can also be seen in this exercise. Since the number of test vectors are very high, looking

at the wave forms and verifying is not only tedious but also error-prone. Moreover, since

the number of test vectors match the real scenario, test benches provide a fool proof and

less time consuming method to test the design. In case of bugs, the specific part can be

tested using waveform simulations.

This exercise mainly focuses on synthesizing the ALU design with the ripple carry adder

for different timing constraints. The RTL code is mapped using a 1.2 V, 130-nm process

technology library for synthesizing the design. We first synthesize without any timing

constraint and further with stricter constraints analysing the worst case path, area

required for implementation, power consumption for each case. Finally we verify the

synthesized design i.e., the gate netlist produced by the RTL compiler to check if it is

matching with the original VHDL code.

Results: As said above, initially the ALU code is synthesized without any timing

constraint. Since no constraint is set by us, we obtain the intrinsic implementation timing

so that we can provide further constraints based on this value. The timing report for my

design indicated 5454 ps as the worst case time, with the worst case path through the

ripple carry adder from bit 0 of input register B to the 30th bit of output register Outs

which is obvious. D-flip flops, inverters, a non inverting multiplexer, an AND-OR-INVERT

gate, a 4 input NAND gate and many 1-bit full adders belong to the worst case path. The

total area of implementation is 13129.95 µm2. As high as 69% is occupied by logical

elements while 27% constitutes of sequential elements.

In the next assignment, we synthesize by giving a time constraint which is 50% of the

previously obtained value which is 2727 ps. This was satisfied with a timing slack of 3 ps.

The worst case path here is still through the ripple carry adder starting from bit 0 of

Register A to the bit 31 of Output register. This path consists of D-flip flops, a 2 input

NAND with an inverted input, inverter, 1-bit full adders among many others. Essentially

the worst case path has not changed much from the previous case, but faster gates have

been implemented in this case. Gates with larger transistors, having higher driving

capabilities and lower resistance, have been selected to meet the timing constraints,

area being a trade-off for speed. Larger gates consume more area with less delays. The

implementation has increased to 15308.53 µm2 where logical elements consume 71.7%

of the area while sequential elements take 24.3%. Also, we notice 9 buffers being added

which take 0.4% of the total area.

We now further provide stricter constraints of 1250 ps i.e 800 MHz. We find that there is

a timing violation with a timing slack of -1139 ps. So this design is not suitable for an 800

MHz. The worst case path is again through the ripple carry adder. The area of

implementation in this case is 15346.86 µm2 which is slightly higher than the previous

one. In this case faster gates and flip flops with low delays and fan-outs are chosen and

also more buffers have been added. The percentage of buffers (24 in number) has

increased to 1% which also add to the cause of higher speeds. Even with the fastest

available gates in the given technology library of 130 nm, we are unable to meet the

timing constraints.

Finally, the synthesized netlist that is generated is verified against the same set of

vectors using which the VHDL code was verified. The netlist which was synthesized with

a 50% time constraint (2727 ps) was successfully verified.

Conclusions: Static Timing Analysis (STA) is an effective way for calculating the delay of

a circuit without simulating is employed here to calculate worst case timing path. Static

timing analysis plays a vital role in facilitating the fast and reasonably accurate

measurement of circuit timing. Since it involves the use of simplified delay models it is

fast and reasonably accurate measurement of circuit timing can be obtained. When the

timing constraint is reduced, the ripple carry adder continues to be the bottle neck of the

ALU even while using the fastest gates available in the technology library. The gates with

higher speeds are selected for tighter constraints but they consume more area. Resizing

of transistors and addition of buffers is done to achieve higher speeds.

Fast adders like Sklansky adder can be implemented to overcome the bottle neck of RCA

in the previous design. The delay in case of this adder is logarithmically proportional to

the number of bits as against the RCA where it is directly proportional to the number of

bits. In this lab exercise, we analyse the improvement of ALU with the sklansky adder

over the RCA. Also, we do power analysis for the two designs.

Results : In the first exercise we synthesize the Sklansky adder and check for the

unconstrained intrinsic delay. My design had a worst case delay of 4805 ps with the

worst path from bit 0 of B register to the bit 31 of Outs register through the Sklansky

adder. The delay value is comparatively lesser than the ripple carry adder delay. The

estimated area for this design is 13892.456 µm2.

Next, we synthesize the Sklansky ALU design with a constraint of 1250 ps i.e. for 800MHz

speed. This design fulfils the constraint of 800MHz and now we notice the delay path to

have shifted from the adder to the right shifter. The worst case here begins with bit 0 of

register B and ends with bit 6 of Outs register. We have now successfully designed an

adder which is no more the bottleneck of the ALU design. On checking for the 10 worst

case paths, we find left shifter(position 8 and 9) also in the list along with the adder. The

required area for implementation is 16494.65 µm2 which is far higher than the

unconstrained case. This is because gates with larger transistors are used to reduce

delay as they have higher driving capabilities, naturally increasing the area for a better

performance. This process of changing the design to fulfill the given set of timing

constraints is called timing closure.

A plot of the area of implementation for both Sklansky and RCA is as shown in figure(1).

Notice that the area of Sklansky adder exponentially increases with the reducing time

constraint while the area of a ripple carry adder increases linearly. So a tighter constraint

means that the Sklansky based ALU uses much faster and larger gates as compared to

the RCA based ALU. Also since adder is no longer a part of the critical path Sklansky

based ALU, it optimizes the shifters using large gates. We also verify the Sklansky ALU

design against the set of given test vectors. The netlist was parsed without any errors.

We now look at the power consumed by both the ALU models. A theoretical-probabilistic

model based is used for analysis. The input high probability is set to 0.5 and its toggle

probability i.e. the number of times it toggles between a high and low is set to 0.02 in the

first case and then to a higher value of 0.1. The obtained values are plotted for toggle

probability 0.02 as shown in figure(2). We notice that RCA ALU consumes more power

with tighter constraints as compared to the Sklansky adder ALU. But looking at the trend

RCA ALU consumes less power for lower frequencies. Table(1) also summarizes the

power consumption values obtained for 0.1 and 0.02 toggle probabilities. We can notice

that there is an increase in the dynamic power when the toggling rate is high because of

higher switching rate between 1 and 0 where as the leakage power remains almost the

same for both the probabilities. When introspected into the power consumption of

individual blocks, we notice the adder block consumes more than 10% of the total power.

We also check for the power consumed by the clock tree. At 360 MHz, it consumes

162117.818 nW and 211706.182 nW for Sklansky and RCA ALUs respectively.

Theoretically calculating the power using the formula Power = f * VDD *C by considering

VDD=1.2V and C = 309 fF for Sklansky and 404fF for RCA ALU, we see that the values

fairly match up with the values we obtained.

We checked for the power consumption by assuming certain probabilities. We could get

more realistic by calculating the power consumed when the test vectors we used for

verification earlier are used for analysis. These vectors are now fed into the tools and

they determine the probability as well as the power consumed. Table(2) shows the power

consumed in the three different cases. The dynamic power is high in case of Random test

vectors which means that the toggling or switching of inputs in these cases is very high

Figure(1) Area(y-axis)

compared to the otherintestµm2 vs timeThe power consumed in case of Realtrace(150000

vectors.

test vectors) is comparable

in ns. with the Regular(1000 test vectors) despite the high number

Figure(2) Power(y-axis)

of test cases in µm2

in the former vs time

case. in

We cannot zero-in upon a fixed number of test vectors

ns. the number of test vectors higher is the accuracy of estimation.

for testing. The more

ic

Leakag Dyna Leakag Dyna Rando 331642.5 10542

Power e mic e mic m 72 65

34915 19390 34905 15123 Realtra 338729.8 52264

RCA 3.1 50 6.6 84 ce 99 0.8

Sklans 33168 17946 33124 13950 Regula 333278.3 54062

ky 7.1 28 1.2 77 r 64 0.4

Table(1) Power consumed in nW for different toggle Table(2) Power consumed in nW for test

probabilities vectors

From the Toggle Count Format (.tcf file), Clk has a toggle probability of 0.5 for Realtrace

and 0.4998 for the other test vectors. The number provided after the probability gives us

the number of times the value switches between 0 and 1. Considering the bits A[15] and

A[16] in all the three cases, we tabulate table(3). Notice that in case of Regular test

vectors, A[15] has a toggle probability of 0 which means it never goes to state 1. This

can be verified from the A.tv file where we see that bit 15 remains zero throughout the

test vectors. Also, we can notice that A[16] toggles from 0-1 and 1-0 justifying 0.5002

probability shown in the .tcf file.

A[1

A[15] A[16] A[15] A[16] 5] A[16]

Probabilit

y 0.4873 0.5022 0.1555 0.0614 0 0.5002

Toggle 242670 246640 82430 53070 4967700

count 00 00 00 00 0 0

different test vectors

Conclusions : The selection of the best design depends upon the set of constraints the

manufacturer or the application requires. In case of our ALU design, Sklansky based ALU

suits well for high speed-more area while RCA based ALU is better suited for designs

where speed is not important but area is the main criterion. This can also be noticed from

the graphs. The power consumed also can be a criteria for selecting among the different

available designs.

The ALU we have designed and tested is now just one step away from being sent to

production. This is the Placement and Routing of the ALU design where the different cells

are placed in the chip area and wires routed between them. This step is usually time

consuming and also re-iterated a number of times in practice. We first form various

blocks like the input, output register blocks and then proceed to Floor planning. Floor

planning is next done manually considering the die area where the different blocks like

adder, input and output registers, shifters are organized on the cell area. It is a good

practice to place macros in the corners or along the perimeter. The input block is placed

along the left edge while the output block is placed along the right edge. Another

important consideration is the placing of time critical blocks. The area utilization of the

various blocks should be between 60-85% for good results. Power routing is then done by

adding power rings and stripes. This is a very important step which if not planned well

affects the performance leading to voltage drops and noise. Even though it is possible to

manually place all the cells along the rows on the chip area, it is highly in-efficient and

time consuming. Also, considering the various constraints, it is practically infeasible. So

only Floor planning is done manually and then this is fed to the Place and Route EDA

tools. We use the tool SoC Encounter with Guide mode for placing of the cells. The netlist

generated during the previous lab exercise was considered for this exercise.

Results : We first use the mode pre-place optimization and then again by in-place

optimization which improves the design performance by reducing delays and correcting

capacitive violations. The placement density also increases from 73% to 86% after in-

place optimization indicating a more spread out placement. Clock Tree Synthesis(CTS) is

now performed on the placed layout. In this step the clock is routed to different parts of

the chip layout. We first perform Pre-CTS and upon looking at the timing report my

design showed a violation by -0.993 ps. A post-CTS operation only worsened the delay

increasing the delay to -1.064 ps.