100%(2)100% au considerat acest document util (2 voturi)
104 vizualizări37 pagini
This document discusses digital physical design flows, including hierarchical and low power implementation flows. It provides an overview of the basic physical design cycle and describes flat, hierarchical, and low power physical implementation flows in detail. The key steps of each flow are explained, including floorplanning, placement, clock tree synthesis, and routing. Issues considered include power, timing, area, and manufacturing yield.
This document discusses digital physical design flows, including hierarchical and low power implementation flows. It provides an overview of the basic physical design cycle and describes flat, hierarchical, and low power physical implementation flows in detail. The key steps of each flow are explained, including floorplanning, placement, clock tree synthesis, and routing. Issues considered include power, timing, area, and manufacturing yield.
This document discusses digital physical design flows, including hierarchical and low power implementation flows. It provides an overview of the basic physical design cycle and describes flat, hierarchical, and low power physical implementation flows in detail. The key steps of each flow are explained, including floorplanning, placement, clock tree synthesis, and routing. Issues considered include power, timing, area, and manufacturing yield.
Implementation Flows Nasim Farahini farahini@kth.se Outline Overview: Digital design flow Flat physical implementation flow (Basic Flow) Hierarchical Physical Implementation flow Low power issues Low power physical Implementation flow Design for Manufacturing/ Design for yield Sign off 2 Overview: Digital Design Flow System Specification Architectural Design Logic Synthesis Physical Synthesis Physical Verification / Sign-off Fabrication Packaging and Testing The Physical Design Cycle Physical Design Cycle (Back End) Gate level netlist Timing constraints (SDC File) Power Constraints (CPF File)
GDSII LEF file: Standard cell layout Info
Lib file: Cell timing Info placement floorplanning Metal Wires Clock Tree mask after OPC mask Results of Front End Design Mask for IC Manufacturing Library of Technology Files Physical Design Cycle Design objectives: Power (dynamic/static) Timing (frequency) Area (cost) Yield (cost) Challenges: More complex systems; billions of transistors can be placed on a single chip Time-to-market Power-constrained design 5 17 Source: ntel (SSCC-03) Physical design flows CAD tools has improved the basic implementation flow to address the mentioned challenges: Flat physical implementation flow (Basic Flow) Used for small and non-power-critical designs Hierarchical physical implementation flow Used for complex systems Divide and conquer method Sub-designs can be implemented in parallel and in a team Low power physical implementation flow Aggressive power management techniques can be used 6 Flat Physical Implementation Flow
Standard cells: layouts of library cells including logic elements like gates, flip-flops, and ALU functions. The height of the cells are constant. Physical Design Based on Standard-Cells 8 Flat Physical Implementation Flow
Floorplanning: Laying out chip Power Planning: Connecting up power Placement: Automated std-cell placement CTS: Clock tree synthesis Routing: Wiring up the chip Layout Verification Finishing: Metal Fill/Antenna fixing/ Via doubling/Wire Spreading 9 10 Full chip Design Overview The location of the core, I/O areas P/G pads and the P/G grid Core placement area Periphery (I/O) area Rings Stripes P/G Grid IP ROM RAM Floorplanning Define core area : (cells + utilization factor) Place IO Ring IO Ring is often decided by front-end designers with some inputs from physical design and packaging engineers. Shape and arrange hierarchical blocks Integrate hard-IP efficiently Predict and prevent congestion hotspots and critical timing paths 11 PLL RAM RAM RAMS out of the way in the corner Large routing channels SUB-BLOCK Standard cells area Single large core area Pins away from corners Die size and initial standard cell utilization factor trade off Utilization refers to the percentage of core area that is taken up by standard cells. A typical starting utilization might be 70% Space between the cells is used for routing and buffer insertion Larger die => Higher cost, higher power High utilization can make it difficult to close a design Routing congestion, Negative impact during optimization legalization stages. Solutions Run a quick trial route to check for routing congestion Increase routing resources Floorplanning 12 Low std-cell utilization High std-cell utilization Power Planning In this step we determine General grid structure (gating or multi-voltage?) Number and location of power pads (per voltage) Metal layers to be used (the top metal layers are typically used) Width and spacing of stripes Rings/no rings around the Hard Macros Hierarchical block shielding More dense power grid (trade off) Reduce risk of power related failures Reduce signal route tracks, Increase number of metal layer masks 13 14 Power Grid Creation : Macro Placement Blocks with the highest performance and highest power consumption
1- Close to border power pads (less IR drop)
2- Away from each other to fed by different I/O power pins (to prevent from Eectromigration) Placement Cost components for standard-cell placement Area Wire length Timing -> Timing-driven placement Congestion -> Congestion-driven placement Clock > Clock gating Power -> Multi-voltage Placement Critical paths are determined using static timing analysis (STA). In general there is a direct trade-off between congestion and timing Timing-driven placement tries to shorten nets whereas congestion driven placement tries to spread cells, thus lengthing nets. Iterative placement trials should be performed to find a balance between the different tool options/settings. 15 Traditional Placement General Concept of Clock tree synthesis 16 Skew Power Area (#buffers) Slew rates NetList: Unbuffered clock tree Buffered/balanced clock tree CLK CLK + Minimize total insertion delay (latency) Clock Tree Synthesis 17 Routing Fundamentals Goal is to realize the metal/copper connections between the pins of standard cells and macros Input : placed design fixed number of metal/copper layers Goal: routed design that is DRC clean and meets setup/hold timing
Consists of two phases 1. Global route: To estimate the routing congestion 2. Detail route: To assign the nets to the routing tracks Horizontal routing tracks Vertical routing tracks Standard cell pin Global Routing 18 Horizontal routing capacity =9 tracks Vertical routing capacity =9 tracks X X Y Y Global Routing Input: Cell and macro placement Routing channel capacity per layer / per direction Goal: Perform fast, coarse grid routing through global routing cells (GCells) while considering the following: Wire length Congestion Timing Noise / SI 19 Often used by placement engines to predict congestion in the form of a trial route or virtual route Detailed Routing Assigns each net to a specific track and lays down the actual metal traces Makes long, straight traces and Reduces the number of vias Reduce cross couple cap Solve DRC violations 20 Hierarchical Physical Implementation Flow
Why to create hierarchy? 22 Hierarchy provides tighter control of individual blocks because the boundaries are well-defined. You can eliminate data size issues and tool capacity limitations Hierarchy reduces design times by Reducing data size; faster runtime Using parallelism that is inherent in hierarchical implementation ; System can be designed in a team Hierarchy provides support for reuse. The challenges compared to the flat design flow Much more difficult for fullchip timing closure More intensive design planning needed, repeater insertion, timing constraint budgeting. What is a hierarchical Design? Hierarchical design can be divided into three general stages: Chip planning Break down the design to block-level designs to be implemented separately. Implementations This stage consists of two sub-stages: Block implementation for a block-level design Top-level implementation for a design based on block-level design abstracts and timing models Chip assembly Connect all block-level designs into the final chip 23 Fullchip Design Blk 1 Blk 2 Blk 3 P&R Flow P&R Flow P&R Flow Fullchip Timing & Verification Top-Down vs. Bottom-up Hierarchical Flow Top-down Flow: Import the top level design as a flat design Floorplan the design and define partitions Pin assignment and time budgeting of the partitions based on the top-level constraints. Block-level design size, pins, and standard-cell placements will be guided by top level floorplaning and I/O pad locations. Bottom-up Flow: It only consists of implementation and assembly stages. The size, shape, and pin position of block-level designs will drive the top-level floorplanning. Each block in the design must be fully implemented. Then they are imported as black box into top-level. 24 Logical Hierarchy vs. Physical Hierarchy 25 module chip ( in1, in2, in3, out1, out2, ...) module block1 ( a, b, c, ...) module sb1 ( x, y, z, ...) in1 in2 in3 out1 out2 b c Chip level Pads block level pins chip module sb2 ...) module sb3 ( x, y, z, ...) sb1 sb2 sb3 block1 The modules that correspond to the partitions need to exist in the netlist. Chip Planning for Hierarchical Design Initialize floorplan and IOs Specify the partitions Power grid insertion Clock planning Feed through insertion Quick placement Trial route Partition pin assignment Timing Budgeting Commit partition / Physical pushdown Partitions are ready for block level implementation Hierarchical Design: Specify Partitions / Plan Groups Netlist must have partitions as top level modules. Partitions generally sized according to a target initial utilization : ~70% utilization, ~300k-700k instances Channels or abutment Rectilinear block shapes are possible Abutment Channels Rectilinear Blocks Hierarchical Design : Clock Planning Global clock trees (H-trees) Can reduce total insertion delay and balance full chip skew At least one endpoint per block Distribution of other high fanout nets should also be considered Hierarchical Design : Feedthrough Insertion For channelless designs or designs with limited channel resources Requires a change in the partition netlist Partition C Partition B Partition A IN IN OUT I/O Pin Feedthrough Candidates Net1 Net2 IN IN OUT I/O Pin Net2 Net1 Net1a Net1b Net2a Hierarchical Design : Partition Pin Assignment Pin Guides are created for every partition. Pins are positioned based on the top level floorplanning, placement, and routing. Objectives: reduce total wire length, reduce congestion, high quality top level routing Partition Pin guide 1 Pin guide 2 Pins at partition corners can make routing difficult Hierarchical Design : Timing Budgeting Chip level constraints must be mapped correctly to block level constraints The design must be placed, trial routed and have pins assigned before running budgeting Block level constraints will be assigned as input or output delays on I/O ports based on the estimated timing slack. Sign-off must be done on full chip constraints, since budgeted constraints are rough estimates only. IN1 Block Boundary set_input_delay 1.5 [ get_port IN1 ] 1.5ns Hierarchical Design : Commit Partition and Block Level Implementation Commit partition Power nets and pre-routed signal routes are pushed-down into the appropriate partition based on their physical location. A physical database file (e.g. DEF), verilog netlist and constraint file (SDC) is created for each new partition. Block Level Implementation Implementation based on the given guidelines, provided by chip level planning The output of this phase is to produce the P&R netlist and timing model of the block. These files are used for chip assembling phase. Hierarchical Design : Fullchip Timing Closure Fullchip timing closure is typically a bottleneck for design cycles. Block-level P&R flow guarantees that the timing constraints inside the block (flop-to-flop) are met. Block-level P&R flow does not emphasize io-to-flop, flop-to- io, io-to-io timing paths, because budgeted constraints are only estimates Interface logic models (ILMs) can be used for fullchip timing closure Interface Logic Model (ILM) ILM is a technique to model blocks in Hierarchical implementation flows. Logic that only consists of register-to-register paths on a block is not part of an ILM. ILMs do not abstract. They simply discard what is not required for modeling boundary timing. This model is used to speed-up timing analysis runs when fullchip design is too large. 34 C C C C C A B Clk X Y C C A B Clk X Y C Original Netlist of the partition Interface Logic Model (ILM) of the partition Low Power Physical Implementation Flow
Voltage scaling for low power 36 Low Power Low VDD Low Speed Speed Up Low V th
P VDD 2
I ds (VDD - V th ) 1~2 I ds (VDD - V th ) 1~2
High Leakage I leakage e -C x Vth
x 12 per 100mV VT decrease Also depends on T again power problem 37 Power Consumption and Reliability Static Power (Leakage Power) Dynamic Power IR-Drop / Voltage Drop 1 out of 5 chips fail due to excessive power consumption Fail Electromigration (EM) Floorplan + Design of the grid Power density problem in the Long run Average Power problem IR-Drop The drop in supply voltage over the length of the supply line A resistance matrix of the power grid is constructed The matrix is solved for the current source model at each node to determine the IR-drop. Static IR-Drop Analysis: The average current of each gate is considered Dynamic IR-Drop Analysis: The current of the gate as a function of time is used (actual switching event is considered) 38 VDD Pad VDD 39 IR-Drop Minimum Tolerance Level 3.0V Ideal voltage level Actual voltage level IR-drop effects Unpredictable performance (eg. Effect of crosstalk enlarged) Logic failures due to reduced noise margins Decreased performance (timing) Excessive clock skew (clock drivers) Electromigration (EM) Electromigration: Refers to the gradual displacement of the metal atoms of a conductor as a result of the current flowing through that conductor. Transfer of electron momentum Can result in catastrophic failure do to either Open : void on a single wire Short : bridging between to wires Even without open or short, EM can cause performance degradation Increase/decrease in wire RC 40 Power reduction at Different Levels System architecture Software/hardware power management Voltage scaling / frequency scaling Multiple voltage islands Power aware algorithms IP selection (performance - power ) Clock gating, logic structuring Multi-V th cell selection to reduce leakage Multi voltage islands Power gating CMOS low-leakage process techniques: high-K ..etc. System Implementation Process 41 Modern Digital Low Power Flow 42 1 integrated cell to avoid glitches! Low power logic implementation techniques 1- Multi-voltage and power gating techniques modify the netlist, connectivity, and insert special cells 2- Use of a set of Power Constraints files (CPF/UPF) just like Timing Constraint Files 3- Clock gating Extra cell: Integrated Clock Gate (ICG) prevents glitch propagation to the gated GCLK
Low power logic implementation techniques 4- Operand isolation 5- Gate level power optimization No extra library cell is needed Extra specialized standard-cell Reduces dynamic power are needed Reduces dynamic power
Modern Digital Low Power Flow 43 Extra cells: 2 or more libraries are needed ex. High-VT, Low-VT and Standard-VT
Modern Digital Low Power Flow 44 Leakage Power Optimization A multi-V th library is the key factor of leakage power optimization Using high V th cells on non-critical paths to save power Using low V th cells on critical paths to improve timing Low V TH
Nominal V TH
High V TH
Delay Leakage Current Multi-Threshold Low power logic implementation techniques 6- Multi-Vth insertion strategies
Modern digital Low Power Flow Low power physical implementation: Floorplanning and Power planning Power Network synthesis (PNS) Power Network Analysis (PNA) Low power placement Register clustering Low power CTS Minimizing clock tree capacitance Low power routing
45 Low Power Techniques Supported by Physical Implementation tools 7- Multi-voltage (reduces the dynamic power) Multiple different core voltages in the same chip 8- Power gating (reduces the leakage power) Coarse and fine grained State retention mechanism 9- Dynamic voltage and frequency scaling To adapt the power consumption and workload
46 Standard Databases, Low Power Cells Additional cells which are required for low power techniques Integrated clock gating cells For standard clock gating Level shifters For Multi-Voltage implementation Isolation cells For Power Gating implementation State retention registers For Power Gating implementation Always-on buffers For Power Gating implementation Power Gate Cells Header/Footer Switches For Power Gating implementation 47 Multi Voltage Design 48 core IP RAM ROM PD1 PD2 PD3 Define power domains create power domain names list of cells connected to VDD1, VDD2,GND1, draw the power domains Place macros Take into account: routing congestion orientation Manual usually better than auto (take info from FE ) Multi-Voltage Design 49 MV Level-Shifter Cells 0.9 0.7 0.7 1.08 LS LS L S
L S
L S
L S
VDD1 VDD2 VSS IN OUT
logic model Dual H-L and L-H level shifter Additional Cells Low-to-High Level Shifter High-to-Low Level Shifter Multi-Voltage Design : Level Shifters 50 Example P&G for a domain with level shifters VDD1 VSS OUT VDD2 IN VDD1 VSS Level shifter cell LS region VDD1 VDD2 VDD1 0.9V 0.9V 0.7V OFF Additional Cells Isolation cells Retention flops Power gates Always-on buffers Power Gating Header Switch Footer Switch Floorplan of Footer switch: same height as standard cells or double VDD VDDG SLEEP SLEEP VSS VSSG Power Gating: Power Gates Power switches are used to shut down the unused area 53 Power Gating : Switch Layout - Ring style Sleep switches are located between always-on power ring and virtual power ring (VVDD) Easy to implement compared to Grid Style and less impact on placement and routing Large IR-drop (switch resistance +thin VVDD net) Used for power gating of hard-IPs and small blocks Does not support retention registers Also called coarse-grained VDD Global VDD VVDD1 domain VVDD2 domain VDD Sleep transistors 54 Power Gating : Switch Layout - Grid style VDD network all across the chip; Virtual power networks in each gated domain Switches are placed in grid connecting VDD and VVDDs Improved IR-drop characteristics because every switch drives small number of local instances Large impact on placement and routing due to distributed switches Supports retention registers Also called fine-grained style Global VDD VVDD2 VDD VVDD1 VVDD1 VVDD1 VVDD2 VVDD2 VDD Power Gating : Isolation Cells 55 Isolation cells ensure the electrical and logical isolation of the cells in shut down from active logic in a design. When a blocks is shut down, the internal signals may transition to unknown, float state ->Incorrect functioning for the rest of the design Prevent snake paths for current to flow between power and ground if cells driving the shut down region are improperly designed To be added to input/output signals of the shut down region Retention registers have a shadow high-Vth latch built-in, which is connected to always-on voltage Comprehensive testing is required Data should be restored to the main register (low-Vth) after a few cycles when the block is awoken Power Gating : Retention Registers 56 Retention Register - preserve status while the logic is turned off 1.08V/OFF 1.08V/OFF 0.7V 0.9V CTR sleep RR RR RR Power Gating : Retention Registers 57 Example P&G for a domain with retention registers LS region VDD1 VDD2 VDD1 VSS VDD Retention register region VDD2 VDD VDDG VSS AO IN OUT Buffering of signals in powered-down areas Signals crossing from active to active area that needs buffering in powered- down block Power control signals Always-on VDD or VSS pins are not directly connected to the power rails Connected during routing with unswitchable power/ground VDD VSS VDD_local (on/off) VDD_global (always-on) VSS_global (always-on) VSS_local (on/off) Normal inverter: power rails only Always-on inverter: power rails +power pins Power Gating : Always-on Buffers Design Considerations for 90nm Technology and Beyond
Processing issues for <90nm Technology 61 For <90nm technology aggressive DRC and enhanced DFM/DFY techniques should be applied to increase the manufacturability and design yield Design Rule Check (DRC) Guidelines about the geometry constraints for constructing process masks Design For Manufacturing (DFM) Design techniques to allow the design to be manufactured correctly Design For Yield (DFY) Design techniques used to improve yield
DRC, DFM and DFY DRC (Design Rule Check) Information like: Routing layers: width, spacing , pitches General Rules: a) enclosure, b) space, c) overlap d) width, e) extension Specific rules Antenna rules, metal density rules, minimum area A compromise between performance and yield More conservative rules increase probability of correct circuit function (yield) More aggressive rules increase circuit performance ( area, power, delay) 62 63 63 DRC: General Rules 0 1 2 3 4 5 DRC a b c d e a enclosure b space c overlap d width e extension DRC: Metal Density (AL) High density Low density Etching takes longer Solution: Add dummy metal structures here to maintain minimum metal density inter-level dielectric resist Al etching High density Low density overetching minimum density needed 65 DRC: Metal Density (Cu) Ta barrier hard to remove. =>not too much Ta barrier. min density =>max. density Softness of Cu results in dishing inter-level dielectric Tantalum barrier layer chemical-mechanical polishing DRC: Max Metal Density Solution 1 slots Solution 2 split wires In GDSII Mx: diff datatype 66 Fat wires problem => cracks may occur due to thermal expansion stress if large current DRC: Recommended Rules Wire spreading Avoid asymmetrical contacts Layout guidelines for yield enhancement Guidelines for optimal electrical model and silicon correlation 67 68 DRC Challenges 100 200 300 400 500 600 700 800 0 180 130 90 65 45 nm Count of Design rules in the runset The number of design rules in the DRC runsets for different technology processes Reasons: - More metal layers - Diff spacing rules depending on width - Recommended rules general rules DFM / DFY : Techniques Redundant via insertion (Multi-cut vias) 90 nm : recommended rule (yield increase) Some tools do concurrent redundant via insertion Can also be done afterwards (post route fixing) Place where possible Via Reduction Minimize total number of vias A significant percentage of defects are traced to via failures
69 DFM / DFY : Techniques Wire Straightening ( reduce jogs ) Bent wires are particularly prone to greater lithographic variations Wire Spreading Spacing wires can reduce the probability of a particle defect causing chip failure Particle defect causing short More space prevents short 70 Sign-off Parasitic RC extraction Advanced delay calculation & signal integrity analysis Advanced IR drop and Electromigration analysis Thermal map and influence on timing Noise Analysis Inter-die and intra-die variation At 65 and 45nm, the effects of inter-die and intra-die variations become significant Statistical analysis approach to factor variations in Logic equivalence check Send Verilog + SPEF (SDF) to frontend designers for final verifications Layout verification Design Rule Check (DRC) Layout vs. Schematic (LVS) Transfer to design finishing group LVS(Layout vs. Schematic) Compare with LVS vdd IN OUT vss Extract the designed devices (nmos, pmos,n-well tap,) Extract the connectivity between Build a netlist Compare both netlist Top level labels needed for VDD,VSS, inputs and outputs 72 Summary Flat and Hierarchical Physical Implementation flow is discussed Low power challenges and standard low power physical implementation flow is discussed Processing issues for small technology nodes are explained Solutions to improve the manufacturability and yield are discussed 73 74 References 1. Advanced Digital Physical Implementation Design, IDESA Course, 2012. 2. Cadence Encounter Digital Implementation (Hierarchical) training course material, 2013. 3. "Sleep Transistor Design and Implementation - Simple Concepts Yet Challenges To Be Optimum", K. Shi, D. Howard, VLSI DAT 2006. 4. "Dual threshold voltages and power-gating design flows offer good results", Kaijian Shi (Synopsys Professional Services), EDN Feb. 2, 2006. 5. Jupiter XT Training Version 2005.09, Synopsys CES. 6. What's New in Galaxy Low Power 2007.03, Manoz Palaparthi, SNUG 2007 Tutorial. 7. Automating RT-Level Operand Isolation to Minimize Power Consumption in Datapaths, M. Munch, B.Wurth , R. Mehra , J. Sproch , and N. Wehn