Sunteți pe pagina 1din 6

An Innovative Flow To Implement Large Scale Design Changes In The Final Stages Of Physical Implementation

Manoj Kumar Dadhich


Freescale Semiconductor India Univ. of Southern California,USA

Vajeed Nimran
IIT Mumbai, India

manoj.dadhich@freescale.com Amit Bandlish ABSTRACT

bandlish@usc.edu

vajeed@iitb.ac.in

distribution structure and routing and pose new noise and signal integrity problems. In the case of small scale change, an ECO can be accomplished by simply modifying the input netlist and following a traditional verilog-ECO flow to incorporate the changes as such in a completely routed design. Most conventional ECO flows would be successful in handling design changes smaller than, say, a few cell changes, through verilog editing, which do not significantly affect other parts of the design. If however, the scale of change is extremely large, or the design is too complex for such fixes to be done manually, it would necessitate re-synthesis, in which case this approach would fail. The aim of the paper is to propose a flow for implementing such a large scale change in designs which are at a very late stage of their design cycle, while at the same time have minimal impact on other parts of the design. The paper is organized as follows: Section 2 describes the traditional ECO flow and its limitations which are addressed in Section 3 which describes in detail the proposed ECO flow. We share some experimental results in Section 4 and then present our conclusion in Section 5.

A technique to implement large scale RTL Engineering Change Orders (ECOs) in System-on-Chip (SoC) designs at their final stages of physical implementation is presented. The technique aims at minimizing the design cycle time for implementing critical-path ECOs that affect large parts of a design so that designers are able to incorporate such changes without causing a major schedule impact. The proposed flow was successfully implemented and tested on a 90nm wireless Application Specific Integrated Circuit (ASIC) design that was first-pass success on silicon.

Keywords
ECO, clock tree synthesis, Sea-of-Gates, optimization

1. INTRODUCTION
Baseband SoC design cycle times for 90 nanometer designs of the complexity of millions of gates can typically take up to six to eight months. Of this time, about three to four months are spent in freezing design specifications, getting functional Register-Transfer-Level designs and generating gate-level netlists which are optimized for timing. Effectively, the physical design activities such as clock tree synthesis, routing and timing closure thus have about two to four months to complete. Physical implementation is also a stage where logic bug frequency is at its peak [1]. Given such a scenario, a physical design engineer is invariably hard pressed for time to close on any modifications of initial conditions, such as an ECO, especially when the change involved is fairly large. In addition, as post-route verification generally progresses in tandem with the routing activities, there may be design changes up to the very last stage of physical design. Design requirements may necessitate changes in an embedded IP, say, or a critical bug may be discovered at the verification stage which requires to be fixed. Given the aggressive time-tomarket schedules and customer demands, implementing large scale incremental changes or complete re-synthesis of affected modules in a design close to timing closure can become a physical designers nightmare. As is evident, any large change would impact not only the timing and placement of logic but also the designs clock

2. CONVENTIONAL ECO FLOW AND ITS LIMITATIONS


As already emphasized, traditional ECO flows work best when changes are small. Typically, an ECO would be done by a front-end designer making manual changes to a netlist which is then used for incremental placement of the new logic. Instances associated with this and any other affected logic are then incrementally routed. This data is extracted and goes through static timing checks. The complete flow for implementing an ECO in the conventional manner is shown in figure 1.

Manual Netlist Modification

New Netlist + Old DEF (Physical information)

ECO Place new instances

ECO Route new nets

Since, intrinsically, incremental placement algorithms are not as good as full-blown algorithms, an incremental placement is bound to be left with some timing violations. It is possible that many of these violations may not be fixable, with the limited placement area available for the ECO. Such violations can then be fixed by using a multi-vt flow in which standard-vt cells, which are faster than the high-vt cells but have higher leakage, are used. Therefore, while implementing the ECO, the physical designer must also take into account the number of standard-vt cells that may need to be added to close on timing. Needless to say, it is imperative to keep the use of such cells to a minimum.

3. PROPOSED ECO FLOW


Parasitic Extraction

In order to address the aforementioned limitations, we propose an ECO Flow for efficiently implementing large scale design changes. The design under consideration is the applications processor (AP) platform of a 90 nanometer wireless ASIC of gate count of approximately 1.4 million equivalent placeable nand gates in the Sea-of-Gates (SoG). The die size is approximately 24.144 mm2 with a rectilinear floorplan (see Figure 2 below and Table 1)

Static Timing Analysis (STA)

Figure1: Conventional ECO Flow

There are some major limitations of this flow in implementing large scale ECOs: Firstly, it is impractical to manually modify netlists having a large number of changes. In this case, there would be no option but to resynthesize the gate-level netlist from the modules RTL. Secondly, since most ECOs usually involve a small logic change, most incremental placement algorithms supported by commercially available tools are congestion driven rather than timing driven. This can become a major drawback when implementing a large-scale RTL design change, as the incremental logic may end up having new timing violations. Most importantly, if a considerable part of the design has been modified, it would inevitably lead to changes in the clock trees and routing topology. Current tools for synthesis, such as Physical Compiler (PC) from Synopsys, do have ECO capabilities but practically work well only on changes of the order of 2-4% of the design. Beyond this, incremental synthesis and placement capabilities fail to provide optimal results, both from the timing and routing perspective, e.g. creation of localized over-congestion. Apart from the usual timing checks, designs in the nanometer scale face deep-submicron (DSM) issues such as signal integrity (SI) viz. delay and functional noise as well as electromigration and DFM, which depend largely on route topologies. Therefore, in a routed design which is delay and glitch noise clean and has DSM issues addressed, implementing such an ECO involves not only ensuring that timing is met but that it is accomplished with minimal routing changes so that noise analysis and repair cycle time after the ECO implementation does not become prohibitively large. Another important consideration is the power dissipation or leakage in the chip. This factor assumes particular importance in wireless devices, which have strict limits on power dissipation to increase battery life.

Figure 2: Applications Processor (AP) Platform

DESIGN PARAMETERS No. of Placeable Instances No. of Nets No. of hard blocks No. of IO Pins Chip Size Core Size Utilization No. of Clock Domains Maximum Frequency of Operation

VALUES 346293 356501 15 4833 2.6948e+07 um2 1.4931e+07 um2 72% 40 399Mhz

Table 1: AP Design Specifications

The flow was successfully implemented on two modules, sdma (light gray) and scmfbc (dark gray) of this platform (Refer

figure 2). Sdma constitutes about 15% of the SOG area of AP platform and is one of the most timing critical modules of the platform, working at 133Mhz. Scmfbc is comparatively smaller at about 2% of the SOG area and also works at 133Mhz. See Tables 2 and 3 below. SCMFBC DESIGN PARAMETERS No. of Cells No. of Nets No. of hard blocks Utilization Maximum Frequency of Operation No. of Clock Domains
Table 2: SCMFBC Design Specs

New RTL + Timing Constraints + Physical Information

Optimized Gate-level Netlist

VALUES 6204 7083 0 65% 133 MHz 2


Partial Clock Tree Synthesis Region Based Timing Driven Placement

Pre-Route Timing Optimization

SDMA DESIGN PARAMETERS No. of Cells No. of Nets No. of hard blocks Utilization Maximum Frequency of Operation No. of Clock Domains

VALUES 52896 56041 2 70% 133 MHz 6


Parasitic Extraction and Static Timing Analysis Post-Route Timing Optimization Incremental Routing Restoration of Original Clock and Signal Routes

Table 3: SDMA Design Specs

The basic steps of the proposed ECO flow have been outlined below: (See figure 3) 1. Re-synthesis of the module in which the ECO has been implemented 2. ECO Placement of the module using First Encounter (FE) from Cadence Design Systems. 3. Timing Optimization in FE 4. Routing with minimal changes to the rest of the design 5. Post-Route Timing Optimization

Figure 3: Proposed ECO Flow

The timing constraints required for standalone synthesis of the affected module were characterized from the platform-level constraints. Using these inputs, the new gate-level netlist for the module was generated which was then plugged in to the original netlist, after removing the original instantiation.

3.2. ECO Placement of the module using FE


The netlist from the synthesis tool has only the affected module changed, with the rest of the design untouched. The first requirement is to place all the logic of the updated module such that the existing placement for the other parts of design can be reused while also meeting the intra-module timing. As incremental placement by the synthesis tool (PC) did not provide completely satisfactory results in terms of timing and routing, there was a need to optimize the placement of the module to close on timing quickly without introducing new routing violations. The following approach was adopted, using First Encounter from Cadence.

3.1. Re-synthesis of the module in which the ECO has been implemented
The first step is to synthesize a gate-level netlist from the RTL level design. For this, Synopsys Design Compiler was used and synthesis was done in the prototyping mode. Information of the physical attributes of the design, such as utilization and frame aspect ratios, was provided for synthesis.

From the database prior to the ECO, the approximate area occupied by the logic of the changed module was calculated. A placement guide was defined within that region and the logic of the ECOed module was assigned to that guide. A placement guide acts as a kind of a soft bound for placing the new module instances. All other instances belonging to the rest of the already placed design were fixed to prevent any movement during ECO placement. After assignment, a full-blown placement was run to place the module in the given bound. For successfully doing a timing driven placement in FE, it was important to achieve a good timing correlation between FE and the sign-off timing analyzer (PrimeTimeTM). In order to accomplish that, a correlation flow for the two tools was developed. Parasitics were extracted under similar conditions and similar reduction parameters from both FE and the sign-off extractor (StarRCXtTM from Synopsys). The extracted parasitics were then compared in the comparison tool Ostrich (Cadence) to obtain RC correlation factors to be applied to the synthesis tools internal extractor before initiating the timing driven placement. Using this method, we were able to achieve a correlation of up to 95% between the two. After correlation, the next step was to read the design constraints for guiding the timing engine while performing logic optimization within the placement guide. NanorouteTM (Cadence) being our sign-off router, the route estimates it made during timing driven placement were closer to the final routes and thus gave more accurate results than those estimated by PC. Therefore, by using this approach we were not only able to obtain a good starting placement timing-wise, but also one which was optimized for routing.

F FFF FF

M U X

F FFF FF

F FFF FF

F FFF FF

Figure 4: Clock tree Convergence Point

With information of pre-ECO delays in these subtrees, only these sections of the clock tree can be re-synthesized such that they meet the pre-ECO delays. In this way, almost all clock nets in other sections of the design can be preserved and iterations for timing as well noise closure on these clock nets can be reduced to a minimum. Routing of these new clock nets is then done in the ECO Route mode, details of which are explained in the following sections.

3.4. Pre-route timing optimization


After timing driven placement of the module in FE, one round of timing optimizations is required to fine-tune the placement. As a start, all transition violations in intra-module nets are selectively fixed. This helps to reduce the magnitude of setup violations caused due to capacitive loading by extremely long routes. This also ensures that the tool does not spend unnecessarily long times in trying to optimize logic in paths which have large interconnect delays. For fixing transition, all nets of the modified module that were violating transition constraints were found and then isolated for fixing using timing optimization commands inbuilt in FE [2]. After fixing transition violations in this module, the next step is to fix setup violations in clock domains which are affecting the changed module. Timing constraints for the design were read in and all clock domains were defined. From these were selected clocks domains that either start from or end at the flops of the changed module. These selected source and destination clock domains were then defined in FE for selective optimization. To ensure that fixing setup violations in these domains did not deteriorate timing on other domains, all paths common to other unaffected clock domains were excluded from optimization. Timing optimization was then performed on these domains. As already mentioned, an important consideration while logic optimization are the constraints on leakage current. In order not to exceed the design requirements on this parameter, only high-vt cells were used for optimization. Though this restriction did have a slightly negative impact in terms of number of components added for final optimization, we were

3.3. Clock tree synthesis


Arguably, the most important factor that ensures quick timing closure in a complex design is the quality of the clock distribution network. The quality of a clock tree can be gauged on the following factors: timing, immunity to noise, powerefficiency and its utilization impact. The earlier in the design cycle the clock tree can meet all these requirements, the quicker would be the timing closure on the rest of the design. Knowing this, preserving the existing clock trees as much as possible was an important requirement to save time and effort that would otherwise have been spent if a complete resynthesis of the affected clock was done. To work around this problem of complete clock tree re-synthesis, a partial clock tree synthesis approach was attempted for clocks being affected by the new module. For this, the first step is to find common or convergent points in the clock trees which are feeding the ECOed module. Starting from the root clock pin , this is the lowest level of the clock tree feeding only the changed module. This point is illustrated in figure 4.

able to stay within the leakage limits defined and fix almost all setup violations.

on a trial route. In addition, optimization is now done based on actual clock tree delays, i.e. clocks are propagated as opposed to ideal clocks used in pre-route optimizations.

3.5. Routing
Clock nets being the most timing critical are generally routed before signal nets to ensure shortest routes and minimum detouring. While routing the ECO clock nets too, all efforts should be made to ensure that the new clock nets are routed with these attributes while minimally affecting clock routes of the rest of the design. To accomplish this, the following approach was used. After completing the partial clock tree synthesis on the module as well as pre-route timing optimization and transition fixing, all unchanged clock nets from the pre-ECO routed design were imported to the ECOed design, leaving out the signal nets. The new clock nets were then selectively routed on this design. Any routing violations caused by these nets with existing clock routes were then fixed in ECO mode to ensure minimal routing changes. See figure 5.
CTS Implemented
Remove Existing Routing

4. EXPERIMENTAL RESULTS
Our proposed ECO flow was implemented on the applications processor (AP) platform of a 90 nanometer wireless ASIC. AP platform has about 20 different peripherals. The ECO to be implemented involved a change in one of the processor cores of the platform as a result of a functional bug found at the postlayout verification stage. The change being major (17% of SoG) and occurring at a stage where the design was close to timing closure, the traditional flow would necessitate a complete re-synthesis of the SoG, which would mean a 5-6 week schedule impact. However, with our ECO flow, we were able to successfully implement the change, saving about 4.5 weeks in the process. (See Table 4)

Metrics No. of Placeable Instances WNS at Initial Placement Stage WNS after fully automated Timing Optimization in FE No. of New Violating paths in the rest of the design No. of Violating paths (after optimization) No. of Standard-Vt cells used to fix these violations Schedule Impact with traditional flow Schedule Impact with Proposed flow No. of Routing violations before ECO No. of Routing violations after ECO No. of Functional Noise Violations before ECO No. of Functional Noise Violations after ECO

SCMFBC 6204 ~4ns -1.5ns 0 30 <200 3-4 weeks 1 week ~200 ~230 ~6 ~10

SDMA 52896 ~5ns -1.2ns 0 700 <300 3-4 weeks 1.5 weeks

Import pre-ECO clock nets

Route new clock nets Correct clock net DRCs in ECO mode and fix route topologies Import pre-ECO signal nets Incrementally route signal nets Final Routed Design

Figure 5: Routing Flow

Once these violations were fixed, route geometries of all clock nets were fixed to avoid further changes when new signal nets would be routed. The next step was to import the routing information of all the signal nets from the pre-ECO design to this database and route them incrementally to honor existing topologies as much as possible. Using this methodology, we were able to ensure that very few, if any, new functional noise violations were introduced (See Table 4), which again was instrumental in quick timing closure.

Table 4: Experimental Results


The results can be divided into three main heads:

4.1. Timing results 3.6. Post-route timing optimization


Since there is always a difference between the pre-route estimates and actual post-route delays, one round of timing optimization is necessary for transition and setup fixing at the post-route stage. For fixing any remaining violations in the post-route stage, essentially the same steps are followed as in the preroute stage. Some of the salient differences are that since routes are in place, actual post-route delays are now used by the optimization engine instead of delay estimations based The scmfbc module after initial placement by PC started with setup violations of the order of 4ns. With our flow, we were able to reduce these violations to less than 1.5ns for setup. The remaining negative slack was fixed through the use of fast standard-vt cells. The normal implementation, involving synthesis from scratch would take approximately 2 weeks which was accomplished in five days with our flow. Similar results were seen on sdma module as well which was much bigger and more timing critical. We were able to close it

at 1.5ns negative slack for setup through this flow and the time saving on this module was even more, at about 3 weeks. Overall, this flow helped in closing timing on complex modules much ahead of expected time.

4.2. Routing and Functional noise results


By ensuring minimal changes to both clock and signal routes through this flow, we were able to minimize emergence of new routing-related violations. We were able to ensure almost the same number of DRC violations at the end of the flow as we had started with on the fully routed design. Crosstalk noise was also kept to a minimum with the two step clock routing and ECO signal routing. This ensured minimum iterations to close on noise-induced functional and timing violations.

4.3. Power saving


As mentioned in the flow, it was our endeavor to restrict the use of std-vt cells which would consume more idle power and hence contribute to overall power dissipation and battery life. With our flow, we were able to reduce the number of std-vt cells used to less than a third, in the case of scmfbc and even fewer in the case of sdma.

5. CONCLUSION
The methodology developed in this paper, to implement large scale ECOs in a design in its last stage of physical implementation, was proven on a 90 nanometer wireless ASIC that was a first-pass success on silicon. Using this technique, we were able to incorporate a change affecting approximately 17% of Sea-of- Gates area of the design while reducing the overall time for ECO implementation by more than 65% over traditional ECO flows.

6. REFERENCES
[1] Gilles-Eric Descamps, Satish Bagalkotkar, Subramanian Ganesan, Satish lyengar, Alain Pirson, "Design of a 17-million Gate Network Processor using a Design Factory" DAC 2003, California USA [2] Encounter(TM) User Guide. Product version 4.2.2 Cadence Design Systems, August 2005.

S-ar putea să vă placă și