Documente Academic
Documente Profesional
Documente Cultură
Vajeed Nimran
IIT Mumbai, India
bandlish@usc.edu
vajeed@iitb.ac.in
distribution structure and routing and pose new noise and signal integrity problems. In the case of small scale change, an ECO can be accomplished by simply modifying the input netlist and following a traditional verilog-ECO flow to incorporate the changes as such in a completely routed design. Most conventional ECO flows would be successful in handling design changes smaller than, say, a few cell changes, through verilog editing, which do not significantly affect other parts of the design. If however, the scale of change is extremely large, or the design is too complex for such fixes to be done manually, it would necessitate re-synthesis, in which case this approach would fail. The aim of the paper is to propose a flow for implementing such a large scale change in designs which are at a very late stage of their design cycle, while at the same time have minimal impact on other parts of the design. The paper is organized as follows: Section 2 describes the traditional ECO flow and its limitations which are addressed in Section 3 which describes in detail the proposed ECO flow. We share some experimental results in Section 4 and then present our conclusion in Section 5.
A technique to implement large scale RTL Engineering Change Orders (ECOs) in System-on-Chip (SoC) designs at their final stages of physical implementation is presented. The technique aims at minimizing the design cycle time for implementing critical-path ECOs that affect large parts of a design so that designers are able to incorporate such changes without causing a major schedule impact. The proposed flow was successfully implemented and tested on a 90nm wireless Application Specific Integrated Circuit (ASIC) design that was first-pass success on silicon.
Keywords
ECO, clock tree synthesis, Sea-of-Gates, optimization
1. INTRODUCTION
Baseband SoC design cycle times for 90 nanometer designs of the complexity of millions of gates can typically take up to six to eight months. Of this time, about three to four months are spent in freezing design specifications, getting functional Register-Transfer-Level designs and generating gate-level netlists which are optimized for timing. Effectively, the physical design activities such as clock tree synthesis, routing and timing closure thus have about two to four months to complete. Physical implementation is also a stage where logic bug frequency is at its peak [1]. Given such a scenario, a physical design engineer is invariably hard pressed for time to close on any modifications of initial conditions, such as an ECO, especially when the change involved is fairly large. In addition, as post-route verification generally progresses in tandem with the routing activities, there may be design changes up to the very last stage of physical design. Design requirements may necessitate changes in an embedded IP, say, or a critical bug may be discovered at the verification stage which requires to be fixed. Given the aggressive time-tomarket schedules and customer demands, implementing large scale incremental changes or complete re-synthesis of affected modules in a design close to timing closure can become a physical designers nightmare. As is evident, any large change would impact not only the timing and placement of logic but also the designs clock
Since, intrinsically, incremental placement algorithms are not as good as full-blown algorithms, an incremental placement is bound to be left with some timing violations. It is possible that many of these violations may not be fixable, with the limited placement area available for the ECO. Such violations can then be fixed by using a multi-vt flow in which standard-vt cells, which are faster than the high-vt cells but have higher leakage, are used. Therefore, while implementing the ECO, the physical designer must also take into account the number of standard-vt cells that may need to be added to close on timing. Needless to say, it is imperative to keep the use of such cells to a minimum.
In order to address the aforementioned limitations, we propose an ECO Flow for efficiently implementing large scale design changes. The design under consideration is the applications processor (AP) platform of a 90 nanometer wireless ASIC of gate count of approximately 1.4 million equivalent placeable nand gates in the Sea-of-Gates (SoG). The die size is approximately 24.144 mm2 with a rectilinear floorplan (see Figure 2 below and Table 1)
There are some major limitations of this flow in implementing large scale ECOs: Firstly, it is impractical to manually modify netlists having a large number of changes. In this case, there would be no option but to resynthesize the gate-level netlist from the modules RTL. Secondly, since most ECOs usually involve a small logic change, most incremental placement algorithms supported by commercially available tools are congestion driven rather than timing driven. This can become a major drawback when implementing a large-scale RTL design change, as the incremental logic may end up having new timing violations. Most importantly, if a considerable part of the design has been modified, it would inevitably lead to changes in the clock trees and routing topology. Current tools for synthesis, such as Physical Compiler (PC) from Synopsys, do have ECO capabilities but practically work well only on changes of the order of 2-4% of the design. Beyond this, incremental synthesis and placement capabilities fail to provide optimal results, both from the timing and routing perspective, e.g. creation of localized over-congestion. Apart from the usual timing checks, designs in the nanometer scale face deep-submicron (DSM) issues such as signal integrity (SI) viz. delay and functional noise as well as electromigration and DFM, which depend largely on route topologies. Therefore, in a routed design which is delay and glitch noise clean and has DSM issues addressed, implementing such an ECO involves not only ensuring that timing is met but that it is accomplished with minimal routing changes so that noise analysis and repair cycle time after the ECO implementation does not become prohibitively large. Another important consideration is the power dissipation or leakage in the chip. This factor assumes particular importance in wireless devices, which have strict limits on power dissipation to increase battery life.
DESIGN PARAMETERS No. of Placeable Instances No. of Nets No. of hard blocks No. of IO Pins Chip Size Core Size Utilization No. of Clock Domains Maximum Frequency of Operation
VALUES 346293 356501 15 4833 2.6948e+07 um2 1.4931e+07 um2 72% 40 399Mhz
The flow was successfully implemented on two modules, sdma (light gray) and scmfbc (dark gray) of this platform (Refer
figure 2). Sdma constitutes about 15% of the SOG area of AP platform and is one of the most timing critical modules of the platform, working at 133Mhz. Scmfbc is comparatively smaller at about 2% of the SOG area and also works at 133Mhz. See Tables 2 and 3 below. SCMFBC DESIGN PARAMETERS No. of Cells No. of Nets No. of hard blocks Utilization Maximum Frequency of Operation No. of Clock Domains
Table 2: SCMFBC Design Specs
SDMA DESIGN PARAMETERS No. of Cells No. of Nets No. of hard blocks Utilization Maximum Frequency of Operation No. of Clock Domains
The basic steps of the proposed ECO flow have been outlined below: (See figure 3) 1. Re-synthesis of the module in which the ECO has been implemented 2. ECO Placement of the module using First Encounter (FE) from Cadence Design Systems. 3. Timing Optimization in FE 4. Routing with minimal changes to the rest of the design 5. Post-Route Timing Optimization
The timing constraints required for standalone synthesis of the affected module were characterized from the platform-level constraints. Using these inputs, the new gate-level netlist for the module was generated which was then plugged in to the original netlist, after removing the original instantiation.
3.1. Re-synthesis of the module in which the ECO has been implemented
The first step is to synthesize a gate-level netlist from the RTL level design. For this, Synopsys Design Compiler was used and synthesis was done in the prototyping mode. Information of the physical attributes of the design, such as utilization and frame aspect ratios, was provided for synthesis.
From the database prior to the ECO, the approximate area occupied by the logic of the changed module was calculated. A placement guide was defined within that region and the logic of the ECOed module was assigned to that guide. A placement guide acts as a kind of a soft bound for placing the new module instances. All other instances belonging to the rest of the already placed design were fixed to prevent any movement during ECO placement. After assignment, a full-blown placement was run to place the module in the given bound. For successfully doing a timing driven placement in FE, it was important to achieve a good timing correlation between FE and the sign-off timing analyzer (PrimeTimeTM). In order to accomplish that, a correlation flow for the two tools was developed. Parasitics were extracted under similar conditions and similar reduction parameters from both FE and the sign-off extractor (StarRCXtTM from Synopsys). The extracted parasitics were then compared in the comparison tool Ostrich (Cadence) to obtain RC correlation factors to be applied to the synthesis tools internal extractor before initiating the timing driven placement. Using this method, we were able to achieve a correlation of up to 95% between the two. After correlation, the next step was to read the design constraints for guiding the timing engine while performing logic optimization within the placement guide. NanorouteTM (Cadence) being our sign-off router, the route estimates it made during timing driven placement were closer to the final routes and thus gave more accurate results than those estimated by PC. Therefore, by using this approach we were not only able to obtain a good starting placement timing-wise, but also one which was optimized for routing.
F FFF FF
M U X
F FFF FF
F FFF FF
F FFF FF
With information of pre-ECO delays in these subtrees, only these sections of the clock tree can be re-synthesized such that they meet the pre-ECO delays. In this way, almost all clock nets in other sections of the design can be preserved and iterations for timing as well noise closure on these clock nets can be reduced to a minimum. Routing of these new clock nets is then done in the ECO Route mode, details of which are explained in the following sections.
able to stay within the leakage limits defined and fix almost all setup violations.
on a trial route. In addition, optimization is now done based on actual clock tree delays, i.e. clocks are propagated as opposed to ideal clocks used in pre-route optimizations.
3.5. Routing
Clock nets being the most timing critical are generally routed before signal nets to ensure shortest routes and minimum detouring. While routing the ECO clock nets too, all efforts should be made to ensure that the new clock nets are routed with these attributes while minimally affecting clock routes of the rest of the design. To accomplish this, the following approach was used. After completing the partial clock tree synthesis on the module as well as pre-route timing optimization and transition fixing, all unchanged clock nets from the pre-ECO routed design were imported to the ECOed design, leaving out the signal nets. The new clock nets were then selectively routed on this design. Any routing violations caused by these nets with existing clock routes were then fixed in ECO mode to ensure minimal routing changes. See figure 5.
CTS Implemented
Remove Existing Routing
4. EXPERIMENTAL RESULTS
Our proposed ECO flow was implemented on the applications processor (AP) platform of a 90 nanometer wireless ASIC. AP platform has about 20 different peripherals. The ECO to be implemented involved a change in one of the processor cores of the platform as a result of a functional bug found at the postlayout verification stage. The change being major (17% of SoG) and occurring at a stage where the design was close to timing closure, the traditional flow would necessitate a complete re-synthesis of the SoG, which would mean a 5-6 week schedule impact. However, with our ECO flow, we were able to successfully implement the change, saving about 4.5 weeks in the process. (See Table 4)
Metrics No. of Placeable Instances WNS at Initial Placement Stage WNS after fully automated Timing Optimization in FE No. of New Violating paths in the rest of the design No. of Violating paths (after optimization) No. of Standard-Vt cells used to fix these violations Schedule Impact with traditional flow Schedule Impact with Proposed flow No. of Routing violations before ECO No. of Routing violations after ECO No. of Functional Noise Violations before ECO No. of Functional Noise Violations after ECO
SCMFBC 6204 ~4ns -1.5ns 0 30 <200 3-4 weeks 1 week ~200 ~230 ~6 ~10
SDMA 52896 ~5ns -1.2ns 0 700 <300 3-4 weeks 1.5 weeks
Route new clock nets Correct clock net DRCs in ECO mode and fix route topologies Import pre-ECO signal nets Incrementally route signal nets Final Routed Design
Once these violations were fixed, route geometries of all clock nets were fixed to avoid further changes when new signal nets would be routed. The next step was to import the routing information of all the signal nets from the pre-ECO design to this database and route them incrementally to honor existing topologies as much as possible. Using this methodology, we were able to ensure that very few, if any, new functional noise violations were introduced (See Table 4), which again was instrumental in quick timing closure.
at 1.5ns negative slack for setup through this flow and the time saving on this module was even more, at about 3 weeks. Overall, this flow helped in closing timing on complex modules much ahead of expected time.
5. CONCLUSION
The methodology developed in this paper, to implement large scale ECOs in a design in its last stage of physical implementation, was proven on a 90 nanometer wireless ASIC that was a first-pass success on silicon. Using this technique, we were able to incorporate a change affecting approximately 17% of Sea-of- Gates area of the design while reducing the overall time for ECO implementation by more than 65% over traditional ECO flows.
6. REFERENCES
[1] Gilles-Eric Descamps, Satish Bagalkotkar, Subramanian Ganesan, Satish lyengar, Alain Pirson, "Design of a 17-million Gate Network Processor using a Design Factory" DAC 2003, California USA [2] Encounter(TM) User Guide. Product version 4.2.2 Cadence Design Systems, August 2005.