Documente Academic
Documente Profesional
Documente Cultură
Dynamic timing analysis verifies circuit timing by applying test vectors to the
circuit. This approach is an extension of simulation and ensures that circuit timing is tested in its functional context. This method reports timing errors that functionally exist in the circuit and avoids reporting errors that occur in unused circuit paths. The most common dynamic timing analysis is the so-called min-max analysis method. Under min-max timing analysis, both minimum and maximum delays of circuit components are used to generate outputs, which are ranges (the spread of earliest data and latest arrival data) instead of edges. Since outputs are in turn fed into inputs, managing the ranges (merging them) can become very complex. As can be seen, if both min version & max version of the delays must be used, the simulation speed will be extremely slow. Another major issue with dynamic timing analysis is the incomplete coverage. It may only check circuitry that is exercised by test stimulus, which may leave critical paths untested, and timing problems undiscovered. It is also not path oriented. Since dynamic timing analysis reports errors on a certain pin at a certain time, the user must trace through the schematic to locate the path that caused the problem (difficult for large designs). Finally this method requires development time for test vectors. Dynamic timing analysis tools often track more information than logic simulators, making their performance slower. Also each component must contain both timing information and a functional model before timing verification can proceed. This could prevent the use of new parts that do not have functional models. It should be noted that min-max simulation is not currently used in the industry. Instead, either functional simulation with timing (timing simulation) or formal verification method is typically used to verify complex IC designs. Typically people use the max version of delays to verify the circuit works under worst-case timing (no setup issues) and min version of the delays to verify best-case timing (no hold issues).
Page 1 of 32
Page 2 of 32
Since dynamic timing analysis performs a simulation, it can use the same stimulus as a logic simulation. Because the stimulus functionally exercises the design, false errors of unused or uninteresting paths are not tested. Note a timing simulation reports results differently than a logic simulation. A logic simulation reports results as edge times and a timing simulation reports results as regions of ambiguity. The results of a timing simulation do not specify exactly when an event occurs, they specify a range of time in which an event can occur.
Page 3 of 32
Disadvantages: 1. It can report false errors. 2. It cannot detect timing errors related to logical operation. Static timing analysis is similar to manual analysis process, except that it is automated. This allows the design to be analyzed much faster. This makes it possible for a designer to experiment with different synthesis options and constraints in a short time. This method is also complete because it traces and evaluates all paths in a design, not just those exercised by test stimulus. Because static timing analysis does not perform logic simulation, test stimulus and functional models are not required. This makes static analysis available earlier since development time for stimulus and models are not required. The modeling requirements for a static analysis tool are relatively simple. However, timing information for each component in the design is required and the designer must specify waveform information about the input data and clock signals the design uses. The component timing information can be found in parts libraries or data books. Such timing information typically include: pin-to-pin delays, setup, hold time specifications and signal inversion information, and clock frequency constraints. Clock and data waveforms are a normal requirement of the design process, and do not require additional development time. The major drawback of a static timing analysis tool is that it reports false errors. By checking all possible paths in a design, static timing analysis ensures that all possible setup and hold violations in the circuit have been found. However, the potential to detect some false errors exists since circuit behavior is not considered during the analysis. Static analysis tools cannot detect timing errors related to logical operation. Because static timing analysis does not perform functional testing, it cannot detect timing errors, such as race conditions, that are based on the logical operation of the circuit.
Page 4 of 32
Timing Models
Static timing analysis tools typically use timing models at the logic primitive level. The timing parameters are typically similar among different timing tools. The following are some of the common timing parameters for primitive logic gates, flip-flop and latch.
rise time
fall time
Propagation delay time is the time between the specified reference points on the input and output voltage waveforms with the output changing from one defined level (high or low) to the other defined level. Propagation delay time up is the time between the specified transition reference points on the input and output voltage waveforms with the output changing from low level to the defined high level. Propagation delay time down is the time between the specified transition reference points on the input and output voltage waveforms with output changing from high level to the defined low level.
Propagation Delay Time Measurement A A B Z Z time up time down
Page 5 of 32
is specified as the shortest interval for which correct operation of the flip-flop is guaranteed. Hold time is the time interval between a specified transition reference point of the clock input signal and a specified transition reference point of the data input signal. Hold time is specified as the shortest interval for which correct operation of the flip-flop is guaranteed. Propagation delay time is the time between the specified transition reference points on the clock input and data output voltage waveforms with the output changing from one defined level (high or low) to the other defined level.
Edge Triggered Flip-Flop Timing Measurements C D Q C D setup Q hold time C to Q
setup
Page 6 of 32
Given a system performance requirement, often a maximum frequency, a designer would often want to know if the design would operate reliably under the given system timing environment. The environment includes the input signal arrival time as well as the required departure times (also referred to as stable time and required time) of the output signals. Most timing analysis tools also provide a tracing capability for debugging purposes. Typically a source and a destination are selected, and the timing analysis tool would trace all the paths or a subset of them based on some threshold delay value.
Page 7 of 32
Timing Environments
Most design modules are specified to meet certain performance goals. For synchronous digital circuits, the most common parameters used to describe the timing environments include the system clock frequencies, input arrival times, output required times, output loads, input loads and drive strengths on the input side. The following diagram shows typical design environments settings:
set drive
set operating conditions set wire load create clock definition set max area
set max capacitance set max transition set max fanout set output delay
set drive or set drive cell set Resistance set Capacitance set max capacitance set max transition set max fanout set input delay
set multicycle paths set false paths set max delay paths set min delay paths
set R C
The left hand side parameters specify timing environment parameters on inputs; the right hand side parameters specify timing environment parameters on outputs. The ones in the middle are typically the parameters for the design itself. Refer to notes on how to set design constraints using Synopsys design compiler.
Page 8 of 32
2. Register to register paths This type of paths can be constrained by defining the clock(s) for the registers. 3. Register to primary output paths This type of paths can be constrained by defining the clock for the register and setting an output delay relative to a clock on the output port (departure time). 4. Primary input to primary output paths This type of paths can be constrained by setting an input delay on the input port (arrival time), and minimum and/or maximum delays required at the output (departure time).
Path Analysis
Path analysis is the most fundamental type of analysis and is used as basis for slack analysis, critical path identification and timing model generation (e.g. extracting chip level timing models for board level timing analysis).
Page 9 of 32
10
Example: Determine the path delays for the following circuit segment:
rise=(8,10) fall=(3,5)
rise=(6,8) fall=(2,4)
rise=(7,9) fall=(4,6)
Path delays for the above path: tRmin = 8 + 2 + 7 = 17ns tRmax = 10 + 4 + 9 = 23ns tFmin = 3 + 6 + 4 = 13ns tFmax = 5 + 8 + 6 = 19 Example: Determine the path delay for the following circuit segment:
rise=(8,12) fall=(4,6)
rise=(6,8) fall=(2,4)
rise=(8,12) fall=(4,6)
For the above simple path A to Z: tRmin = 8+6+4 = 18 tRmax = 12+8+6=26 tFmin = 4+2+8 = 14 tFmax = 6+4+12=22
Page 10 of 32
11
A timing analyzer calculates the delay of a path by tracing from a starting point of the path to its ending point, cumulatively adding delays along the way. The longest path is the path that has the largest delay from start point to end point. The shortest path is the path which has the smallest delay from start point to end point. When calculating the longest and shortest paths, the clock to Q delay of a flip-flop are also included in the calculation. Example: Given that the inverter g1 has delay of 20ns, the AND gate g2 has delay of 40ns, the AND gate g3 has delay of 30ns and the OR gate g4 has delay 30ns, calculate the longest and shortest path from A to F:
B g2 A g1 20ns g3 C D 40ns
H F g4 30ns
E 30ns
After exhaustively trace all the paths from A to F: The longest path: Ag1Dg2Hg4F = 20+40+30=90ns The shortest path: Ag3Eg4F = 30+30 = 60ns
Timing Characterization
In the following scenarios, a design/module needs to be characterized: 1. Hierarchical timing analysis 2. For Reuse 3. Feasibility studies 4. A custom block A typical design/module can be fully characterized (timing wise for verification) with the following parameters: 1. Max internal frequency 2. Setup and hold time requirements for the first level flip-flop elements with respect to primary inputs 3. Clock to output delays for the last level flip-flop elements 4. Primary inputs to primary outputs delays The following sections show how these timing parameters can be calculated:
Page 11 of 32
12
OUT2
OUT1
The minimum clock period would then be 10 + 90 + 5 = 105ns The maximum frequency is 1/(minimum clock period) = 1/105 = 9.5Mhz
Page 12 of 32
13
The following formula can be used to calculate the setup time and hold time at the chip level: setup time = (longest data path delay) (shortest clock path delay) + (setup time of register) hold time = (longest clock path delay) (shortest data path delay) + (hold time of register) Since both the data path and the clock path can be independent of each other, the setup and hold time of the chip level model can be either positive or negative. However, the sum of the setup time and hold time with respect to one pair of clock and data paths should be greater equal to zero. The setup and hold values on a sequential logic element such as a FF can be derived in a similar why using transistor level models with C and R. Example: In the circuit below calculate the setup and hold time for the circuit at the primary inputs with the following given information: The setup time for the registers are 10ns The hold time for the registers are 5ns The longest and shortest paths from clock to FF1 and FF2 are both 20ns The longest and shortest paths from in1 to D1 are both 100ns The longest and shortest paths from in2 to D1 are both 40ns
in1
FF1
D1 Q1
FF2
D1 Q1
out1
in2
clk
The setup and hold time calculations with respect to register FF1 are: The setup time of port in1 to port clk is: 100 20 + 10 = 90ns The hold time of port in1 to port clk is: 20 100 + 5 = -75ns The setup time of port in2 to port clk is: 40 20 + 10 = 30ns The hold time of port in2 to port clk is: 20 40 + 5 = -15ns
Page 13 of 32
14
Clock to output delays (Tco) can be calculate using: Tco = (tcp) + (tcq) + (tcom)
Out1 Tcom Out2 Tcq Out3
Clk
Tcp
Not that in the above drawing, there will be three sets of clock to output delays one set for each of the output ports: Out1, Out2 and Out3. Each clock to output pair will consist of the following delays (assuming flip-flop is rising edge triggered): Max clock to output for data rising Min clock to output for data rising Max clock to output for data falling Min clock to output for data falling
Max input to output rise (using longest path, max component delay) Min input to output rise (using shortest path, min component delay) Max input to output fall (using longest path, max component delay) Min input to output fall (using shortest path, min component delay)
Page 14 of 32
15
15-8=7
7 B 7 C D 2
5-3=2
7 H
K -3
7
I
20-23 = -3
J -3
10 13=-3
-3
15 18 = -3
-3
The required time at output L is propagated backward to derive the required time for each component and nets. The slack time is the difference between the required time and the actual data arrival time. The slack numbers are labelled on the nets and it can be seen that the path with worst slack is the critical path. Only two types of slack analysis are covered in detail here: setup slack and hold slack. Setup slack analysis determines whether or not data arrives and is valid at the input of a synchronous device before the input clock arrives. Hold slack analysis determines whether or not data remains at the input of a synchronous device long enough to be clocked into the device.
Page 15 of 32
16
From the above diagram (assuming the clock is not gated), it can be seen that the clock edge of interest would include one clock period if absolute time scale is used. The longest data delay with respect to the clock edge is the third transition on the data for setup check. The data transition for hold check is the fourth transition on data which happens after the clock edge.
Page 16 of 32
17
If the data goes too fast, the fourth edge will be shifted to the left and this is why fast data causes hold violation.
Page 17 of 32
18
data
clock data launch edge for setup t=0 data capture edge for setup
If the time reference point is chosen at the active clock edge of the destination register, it can be seen from the diagram that the clock delay must be compensated by a clock period for setup check. Otherwise the wrong edge will be compared. However, the transition for hold check happens after the clock edge, so that it is not necessary to include the clock period in this case. This also explains why the clock period is not included in either case for hold slack check. It can be seen that the reference of (t=0) only affects setup check, not hold check in single cycle based data transfer.
Page 18 of 32
19
C.L.
source register
C.L.
destination register
clock
For setup analysis, the latest arrival data is used, such data is launched by the previous active edge of the clock. After the active edge at the destination register, the hold slack is calculated with respect to the same clock edge seen by the source register.
data launched by this edge is used for setup check data launched by this edge is used for hold check clock at source register
Page 19 of 32
20
source reg
C.L.
dest. reg
When the source register and destination register do not have the same phase, there can be three cases: they are totally out of phase, the destination is ahead of the source register or the source register is ahead of the destination register.
T/2 - t
T/2 + t
T/2
destination clock
The left pointing arrows point to the launching active clock edges from the source registers. The right pointing arrows point to the active clock edges for the hold time check edges. In each case, the setup and hold slack calculation need to be adjusted. When the source and destination registers are out of phase (the source is T/2 ahead of the destination clock edge), the following can be used to calculate the setup and hold slacks: setup slack = T/2 + minimum clock path maximum data path setup hold slack = minimum data path maximum clock path hold + T/2 When the active edge of the source register is (T/2 + t) ahead of the active edge of the destination clock, the following can be used to calculate the setup and hold slack:
Page 20 of 32
21
Setup slack = T/2 + t + minimum clock path maximum data path setup Hold slack = minimum data path maximum clock path hold + T/2 - t When the active edge of the source is (T/2 - t) ahead of the active edge of the destination clock, the following can be used to calculate the setup and hold slack: Setup slack = T/2 - t + minimum clock path maximum data path setup Hold slack = minimum data path maximum clock path hold + T/2 + t
C.L. clkA
R1A
R2A
The following diagram shows setup and hold relations for R1A to R2B and R1B to R2A:
clkA
setup hold
clkB
clkB
setup
hold
clkA
Page 21 of 32
22
All the above diagrams have assumed that a single clock cycle is used to launch and capture the data. If it is known that multiple clock cycles are needed to for data to reach from one register to another, the setup and hold relations will be different. The following two examples use the delta delay method so that the clock period is used in the setup slack analysis.
Example 1: This example shows setup margin calculation for a given simple circuit. The timing parameters are as following: The clock period is 40 For both flip flops, setup = 10, hold = 5, clock q rise and fall are: (4,15) and (3,12) The NOR gate has output rise and output fall (3,15) and (4,11) The buffers have output rise and output fall times (4,7) and (2,8)
Setup slack for data fall on the second register. din FF1 clk Q rise (4,15) fall (3,12) b4 b2 rise (4, 7) fall (2, 8) clock b1 rise (4, 7) fall (2, 8) b3 rise (4, 7) fall (2, 8)
dout
In this case, both data path and clock path originate from the clock port. The first buffer b1 is common to both data path and clock path. Since the setup margin takes the difference of data path and clock path, we may choose either the maximum or the minimum data for the common gates in the two paths. We choose tRmin=4 for this case. First, calculate the paths for data fall transition at the input of the destination register: The minimum clock path = tRmin(b1) + tRmin(b3) =4+4=8 The maximum data path =
Page 22 of 32
23
tRmin(b1) + tRmax(b2) + CQ(FF1)Rmax + tFmax(b4) = 4 + 7 + 15 + 11 = 37 Setup slack (data fall) = T + minimum clock path maximum data path setup = 40 + 8 37 10 = 1ns When calculate the setup slack for data rise, the clock path is the same and the data path delay used will be different. Namely, we need different values for the register FF1 and the NOR gate b4. The minimum clock path = tRmin(b1) + tRmin(b3) =4+4=8 The maximum data path = tRmin(b1) + tRmax(b2) + CQ(FF1)Fmax + tRmax(b4) = 4 + 7 + 12 + 15 = 38 Setup slack (data rise) = T + minimum clock path maximum data path setup = 40 + 8 38 10 = 0 ns Example 2: This example shows hold margin (slack) calculation for the same circuit:
din Data Hold Slack (margin) for data fall R: 3, 15 FF1 F: 4, 11 R: 4, 7 F: 2, 8 b2 R:4,15 F:3,12 b4
clock
R: 4, 7 F: 2, 8 b1
R: 4, 7 F: 2, 8 b3
For data fall on register FF2, we have the following numbers: The maximum clock path = tRmax(b1) + tRmax(b3) = 7 + 7 = 14 The minimum data path = tRmax(b1) + tRmin(b2) + CQ(FF1)Rmin + tFmin(b4) = 7 + 4 + 4 + 4 = 19 hold slack (data fall) = minimum data path maximum clock path hold = 19 14 5 = 0 ns
Page 23 of 32
24
For data rise on register FF2, the delay values for block b4 and register FF1 will have to be different. we have the following numbers: The maximum clock path = tRmax(b1) + tRmax(b3) = 7 + 7 = 14 The minimum data path = tRmax(b1) + tRmin(b2) + CQ(FF1)Fmin + tRmin(b4) = 7 + 4 + 3 + 3 = 17 hold slack (data rise) = minimum data path maximum clock path hold = 17 14 5 = -2 ns Obviously, there is a hold violation for the given circuit, and a static TA shall report such problem.
Page 24 of 32
25
data path
FF
clock path
The equations for setup and hold slacks are the same. However, The starting points of the paths are from the primary inputs (specified by set_input_delay in case of Synopsys tools). Setup slack = Clock Period + minimum clock path maximum data path setup Hold slack = minimum data path maximum clock path hold
Page 25 of 32
26
Combinational gates
The following equation can be used: Slack = required data arrival time actual data arrival time On the input side, the time data arrival time is needed and on the output side, the data required time is needed.
The equation is: Slack = required data arrival time actual data arrival time When calculating the path delay, clock path, clock to Q for FF and the combinational path delay are all used to get the actual data arrival time.
Page 26 of 32
27
40 din
40
D 20 20
sel clk
In this case, the tool needs to know that the topological longest path (40 + 40) cannot happen by using a justification algorithm like the one discussed in the D-Algorithm for test vector generation. Another way is to de-select a path manually. Note that a minimum of two points is needed to specify the de-selection of the path. In other cases, the path is feasible, but it may take more than one clock cycle to complete the propagation of a signal. A multiple cycle path is typically characterized by a starting point, an ending point and a number of mid-points. The number of cycles is not limited to an integer, but most tools do not allow fraction of a cycle. Multiple cycle are typically cannot be detected by the tool automatically. The designer needs to specify all the multiple cycle paths before timing analysis. False paths and multiple cycle paths are also referred to as timing exceptions.
Page 27 of 32
28
If data from Stage 1 arrives at Ta, Slack = Tb Ta (no violation) If data from Stage 1 arrives at Tc, Slack = 0. Time Borrow (cycle stealing) occurs: Ta Time borrowed: Tc Tb If data from Stage 1 arrives at Td: Slack = Td Te (violation)
Tb
Tc
Td
Te
Data launching from Stage 2 Time is being borrowed here. Stage 3 Latch Capture for Stage 2
Page 28 of 32
29
The following first simple example shows timing borrowing concepts in some details. The second one shows the same example except the latches are replaced by edgetriggered flip-flops. Note: Time borrowing typically only affects setup slack calculation since time borrowing slows data arrival times. Since hold time slack calculation uses fastest data, time-borrowing typically does not affect hold slack calculation.
Page 29 of 32
30
Example 1 (zero cycle data transfer): Clock G Waveform & timing environments:
0 5 10
Latch Timing: G2Q = 0.18 D2Q = 0.16 setup = 0.08 hold = 0.07 Data Timing Diagram over one clock cycle: Time: 0 2 3 4 5
Valid data
10
D to L1: Data arrives at Latch (L1) at t = 2. Timing is met with Tborrow = 2 (slack = 0). L1 to L2: Next, the same data arrives at L1 output at (2 + 0.16 = 2.16). It continues to arrive at Latch (L2) at t = (2.16 + 1.15 = 3.31). Timing is met with Tborrow = 3.31 accumulative (slack = 0). L2 to L3: The same data arrives at L2 output at (3.31 + 0.16 = 3.47). It continues to arrive at Latch (L3) at t = (3.47 + 0.03 = 3.5). Timing is met with Tborrow = 3.50 accumulative (slack = 0). L3 to Q: Note that here, data is assumed to be needed for the next clock cycle. So that data needs to be available at Q at (T 2 = 8). Since time borrowing occurred, data arrives at Q at t = (3.5 + 0.16 = 3.66). The slack time at Q is: (8 3.66 = 4.34)
Page 30 of 32
31
Example 2 (single cycle data transfer): Clock G Waveform & timing environments:
0 5 10
D flip-flop Timing: G2Q = 0.16 setup = 0.11 hold = 0.05 Data Timing Diagram over one clock cycle: Time: 0 2 3 4 5
Valid data
10
D to F1: Data arrives at FF (F1) at t = 2. Add data setup time at F1: 2.0 + 0.11 = 2.11. Data is captured on the next cycle. Slack time = 10 2.11 = 7.89 F1 to F2: Data starts from G (clock) arriving F1 output at t = 0.16. Continues to arrive at F2 input at t = 0.16 + 1.15 = 1.31. Add the setup time of F2: 1.31 + 0.11 = 1.42. Data is captured on the next cycle when t = 10. Slack time = 10 1.42 = 8.58. F2 to F3: Data arrives from F2 at t = (0.16 + 0.03 = 0.19). Add the setup time of F3: 0.19 + 0.11 = 0.30. Data is captured on the next cycle when t = 10. Slack time = 10 0.3 = 9.7. F3 to Q: Data is captured at Q one clock cycle later. Since a percentage of the time is given to data transfer to off chip (2 in this case), data required time is 8. The only time data takes here is the G2Q delay (0.16). Slack Time = 8 0.16 = 7.84.
Page 31 of 32
32
Page 32 of 32