Sunteți pe pagina 1din 12

LowPowerMIPSProcessorDesign

JingpengLv
[1]
,XianzongXie
[1]
,KyungJinPark
[1]
,ByongWuBernardChong
[2]
[1]DepartmentofElectricalandComputerEngineering
[2]DepartmentofComputerScience
UniversityofUtah,SaltLakeCity,UT84112

Abstract Power consumption has become one of the major challenges in IC design. The
paperpresentstwopowersavingmethodsappliedtoMIPSprocessordesign:clockgatingand
multivoltage power supply. The experiments showed that clock gating scheme saved more
than 65% power of baseline implementation while decreased performance by 45%. Also we
have successfully implemented a multivoltage designed MIPS processor even though we face
severalproblemsalongtheway.
IndexTermsVLSI,multivoltagesupply,MIPS,criticalpath

IIntroduction
OurprimeobjectiveistoimplementapowerefficientMIPSmicroprocessor.Thisobjectivewas
motivatedbythefactthatnowadaysICdesignshavebecomemorecomplex;reducingpower
consumptionhasbecomethefirstfactortobeconsideredforICdesign.Especially,thisdemandis
increasingforbatterybasedelectronicsystems,likelaptops,cellularphonesandsoon.
Therearethreewellknownpowersavingoptimizationmethodsasfarasweknow:clock
gating,backbodybiasingandmultivoltagesupplies(MSV)arethosemethods.Wefirst
researchedbackbodybiasing,howeverwefoundoutthatthereareonlyalimitednumberof
documentsavailable,wechoseclockgatingandMSVasourpowersavingmethodstodecrease
thepowerconsumption.
WeimplementedclockgatingfirstandthenappliedMSVtoourbaselineMIPSimplementation.
Specifically,weimplementedclockgatingtothebaselineimplementationandmadeadetailed
comparisonanalysisaboutperformanceandpower.TheresultshowsthatclockgatedMIPSsaves
65%ofpowercomparingtothebaseline.Thereissomereductiononperformance.However,we
foundoutthatthepowerdelayproductismuchbetteronclockgatedMIPSprocessor.
TheconceptofMSVistosegregatethepowerforspecificmodulesandreducevoltageofsome
modulesthatdonotlieonthecriticalpathwhileretainingtheperformanceofthewholechip
unchanged.Thecriticalproblemhereistodeterminethecriticalmoduleregionfromnoncritical
modules.OneofthesolutionswefoundwastogetthecriticalpathusingPrimeTimePXandthen
analyzethecriticalpathwhichhelpedustoobtainthecriticalmodules.Moreover,wefoundout
thatdividingthemodulesintoseveralsubmodulesincreasedthenoncriticalarea.
OurprojecthasimplementedabaselineMIPS(8bit)microprocessorandaclockgatedversionof
thebaseline.BasedonthetwoversionsofMIPS,adetailedanalysiswasmade.Also,we
implementedMSVappliedMIPSprocessorwithLVSbypassingscheme.
IIProjectDesign
2.1MultiVoltageCMOSDesigning
ThekeyideaoftheMSVprojectdesignisthatwedividethewholeMIPSprocessorinto
differentmodulesbasedonthefunctionalsimilarity.Bydoingthis,differentpartsarefunctionally
independent.However,fromthewholechipprospectiveview,theyareconnectedtoeachother.
Attheverybeginning,flattenedMIPSprocessorisbuilt.Itisabaselineforourdesignandall
otherdesignshouldbecomparedwiththebaseline.AndthenMIPSprocessorisdividedinto
threeparts,whicharedatapath,controllerandalucontroller.Thosethreemodulesareusedfor
buildingtheunflattenedMIPSprocessor.Wefigureoutdelayforeachmodule.Basedonthe
delayinformation,wecandeterminethecriticalpathfortheentirechip.Therelationship
betweenthevoltagesupplyandthedelayisthatthehigherthevoltagesupply,thesmallerthe
delay.Sowecandeterminethevoltagesupplyforeachmodulebasedonthedelay,ofwhichthe
criticalpathisconnectedwiththehighestvoltagesupply.
VoltageShifter(VoltageInterfaceCircuit)isanotherimportantmoduleforthedesign.Dueto
thedifferentvoltagesuppliesfordifferentmodules,apotentialproblemarises:whentheoutput
ofthelowvoltagesupplyisusedfordrivingthehighvoltagedomain,itispossibletofail.Tosolve
theproblem,wehavetointroduceavoltageshifter.Weprovidetwodifferentkindsofvoltage
shifters.Thesimpleroneisnothingmorethanabuffer.ByadjustingthewidthoftheNMOSand
PMOS,wecangetthecorrespondingcharacteristicoftheoutput.Anothervoltageshifterisa
universalshifter.Comparedwiththeformerone,thisispowerfulwhileconsuminglesspower.
Nowwemovetothedetailsforourdesign.
TherearethreemodulesfortheunflattenedMIPSprocessor:datapath,controllerand
alucontroller.Thedatapathincludestwosmallmodules,calledaluandregisterfile.Theschematic
ofthewholechipisshowninfigure1.

Figure1:SchematicoftheunflattenedMIPS
Theuniversalvoltageshifterisusedforthecommunicationbetweenpowerdomainswith
differentvoltages.Whenitcomesfromthelowvoltagepowerdomaintothehighvoltage,the
raisedvoltageisrequired;whenitcomesfromthehighvoltagepowerdomaintothelowvoltage
powerdomain,thevoltageshifterisoptional.Theschematicofthisvoltageinterfacecircuitis
showninfigure2.

Figure2:schematicofvoltageinterfacecircuit

Wedefinecriticalpathsbyitslatency.Ifthelatencyofthespecificpathisthelargestone,then
itisthecriticalpathinthecircuitandotherpathsarenoncriticalpaths,whichareshownin
figure3.

Figure3:criticalpathandnoncriticalpathbasedonlatency.Thedarklineshowsthecriticalpath
ofthiscircuit.Thebluecellsareonthenoncriticalpaths.
2.2ClockGatingDesigning
Clockgatingisoneofthepowersavingtechniquesusedonmanysynchronouscircuits.Tosave
power,clockgatingsupportaddsadditionallogictoacircuittoprunetheclocktree,thus
disablingportionsofthecircuitrysothatitsflipflopsdonotchangestate:theirswitchingpower
consumptiongoestozero,andonlyleakagecurrentsareincurred.Figure4showsasimpleclock
gatedregister.Sincethereisonlyoneregister,theblockis100%clockgated.

Figure4:clockgating
III. DesignImplementation
3.1GeneralDesignProcess
The design approach for this project, in terms of actual chip design, is centered on
independentmoduleoperations;thatis,thevariousmodulesthatmakeuptheentirechipshould
beabletobeimplementedwithoutdependingonothermodules.Thisapproachallowsdifferent
parts of a chip to be designed in parallel, and therefore speeds up the process of implementing
chips.
Hereisourdesignflow:
1.Simulationtoverifythefunctionalities.
Since the source code of MIPS is available, so at the very beginning, we should verify the
functionalitiesofMIPStomakesureitisright.
2.Synthesis
Afterconfirmingthefunctionality,thenextstepistofinishsynthesis.Whendoingsynthesis,
wetrieddifferentclockperiodstogetthebestonewhichmakestheslacktimeminimized.After
synthesis,wecanget.repfileand.powfilewhichcontaintiming,area,andpowerinformation.
3.TimingandPowerAnalysis
Herefromthetimingreportfile,wecanalsodeterminethecriticalpathwhichcanbeusedto
determine the modules which are not in the critical path and therefore to reduce the
voltageonthosemodules.
4.Floorplaning,Routing&Placement
5.TimingandPowerAnalysis
Inordertogettheaccurateversionoftimingandpowerinformation,weshouldfiles
generatedbySocEncounter,.sdf(standdelayfile)and.speffileswhichcontainparasitic
componentsofcircuit,gateinformationandRCparasiticinformation.
6.Padding
After the placement and routing, we have to import files into cadence to get the
Corresponding schematic and layout view of the design. After verifying that every module is
availablenow,weuseCCartosynthesizethemtogethertogetthechip.
Another important thing is that we need to put the core of the chip into the pad ring. By
changingtheschematicandlayoutviewofthepadring,wemakesurethatitcanbeusedforour
core.
Figure5:ThefinalclockgatedMIPS
3.2 MSVImplementation
OnimplementationofmultivoltageMIPSdesign,wefacedseveralchallengesontheway...The
first one was the problem of labored insertion of voltage interface circuits. The second one was
passingLVSonthechip.
Thefirstproblemwassolvedbymanuallyaddingvoltageinterfacecellstothe*struct.vfile.
3.3 .1Modifying*struct.vformanualvoltageinterfaceinsertion.
Forexample,ifwehaveadatapath_struct.vfileonthisformat.
moduledatapath(clk,...);
inputclk,...;
...
DFFX1ir0_q_reg_0_(.D(n512),.G(clk),.CLR(n273),.Q(instr[0]));

Wehadtomanuallychangethedatapath_struct.vfilelikethis.
moduledatapath(clk,...);inputclk,...;
wireclkP;
...
INTERFV3X6U100(.A(clk),.Y(clkP));
DFFX1ir0_q_reg_0_(.D(n512),.G(clkP),.CLR(n273),.Q(instr[0]));
(Note: The index of the new cells must be maintained. If the previous maximum cell index
numberwas99,theaddednewcell'sindexbeginsfrom100.)
From this manner, we were able to make a chip thatsupports low voltage inputdrive. The final
layoutofthemultivoltagesupplyMIPSisshownasfigure6.

Figure6:thefinallayoutofMultivoltagesupplyMIPS.Thevoltageinterfacesareintegratedintothe
datapath.Formoredetail,pleaserefertothesection3.3.1.
IVExperimentsandAnalysis
1Simulation
Beforemakingfurthermove,weshouldfirstmakethefunctionalitiesofMIPSareright.
2Timinganalysis
We have timing reports from both Design Compiler and PrimeTimePX and we present both
themhere,sowecaneyethetiminginformationindifferentviews.Alsowemakeacomparison
betweenMIPSbasedonourlibrary(Lib6710_02)andlibraryfromUofU_Digital_v1_2.
2.1TimingofLib6710_02
First,timinginformationfromLib6710_02basedMIPSisshowedinTableIandTableII.
TableIdelayofLIB6710fromDC
datapath alucontrol controller criticalpath
13 3 6 19
TableIIdelayofLIB6710fromPT
datapath alucontrol controller criticalpath
14 3 7 21

BecauseLib6710_02isusedtogenerateMIPSbasedonmodules,wefirstobtaineachmodule's
delayinformationandthenonlyaddupthedelaysthatareoncriticalpath.TableIshowsthe
delayofLIB6710fromDC.TableIIshowsthedelayofLIB6710fromPT.TimingfromPTisgreater
thanthatofDC.
2.2TimingofUofU_Digital_v1_2
Second,timinginformationfromUofU_Digital_v1_2basedMIPSisshowedinTableIIIand
TableIV.
TableIIIdelayofUofU_Digital_v1_2fromDC
baselineUofU clockgatingUofU
timing 19 33
rate 1 1.74

TableIVdelayofUofU_Digital_v1_2fromPT
baselineUofU clockgatingUofU
timing 20 33
rate 1 1.65
TableIIIshowsthebaselineandclockgatingdelayfromDC.Clockgatingdelayisabout74%
morethanbaselinedelay.Thereasonisthatclockisinsertedinanypossibleflipflopcircuits.Asa
result,thedelayisincreased.TableIVshowsthebaselineandclockgatingdelayfromPT.Itisthe
samereasonwhyclockgatingdelayisabout65%morethanbaselinedelay.
2.3TimingComparisonandresults
Third,wemakeacomparisonamongallthedelays,illustratedbyTableVandTableVI.
TableVdelaycomparisonfromDC
baselineUofU clockgatingUofU baselineLIB6710
timing 19 33 20
rate 1 1.74 1.05

TableVIdelaycomparisonfromPT
baselineUofU clockgatingUofU baselineLIB6710
timing 19 33 21
rate 1 1.74 1.11

The result shows that baseline version of MIPS from UofU_Digital_v1_2 library is much
better than clock gating version from UofU_Digital_v1_2 library and baseline version from
Lib6710_02.
3PowerAnalysis
We also have power reports from both DC and PT. Meanwhile, we make a comparison
betweenMIPSbasedonourlibrary(Lib6710_02)andlibraryfromUofU_Digital_v1_2.
3.1powerofLib6710_02
TableVIIpowerinformationofLib6710_02fromDC
CellInternal NetSwitching CellLeakage total
datapath 40.8318mW 5.9885mW 97.5118nW 46.8203mW
alucontrol 1.7168mW 595.7427uW 883.3262pW 2.3125mW
controller 4.2593mW 1.1026mW 6.3379nW 5.3619mW
CellInternal NetSwitching CellLeakage total
total 42.5486mW 6.5842mW 104.7330nW 54.4947mW

TableVIIIpowerinformationofLib6710_02fromPT
CellInternal NetSwitching CellLeakage total
datapath 30.8mW 26.8mW 107.9nW 57.6mW
alucontrol 0.3776mW 0.2014mW 0.9509nW 0.5791mW
controller 4.003mW 3.212mW 7.960nW 7.215mW
total 35.1559mW 30.2134mW 116.8109nW 65.3694mW

Sincethesamereason,wefirstobtaineachmodule'spowerinformationandthenadduptheall
thepowersofeachmodule.
3.2powerofUofU_Digital_v1_2
Table9powerinformationofUofU_Digital_v1_2fromDC
baselineUofU clockgatingUofU
CellInternal 38.6831mW 11.4030mW
NetSwitching 2.3641mW 1.2758mW
CellLeakage 113.6330nW 103.6765nW
total 41.0473mW 12.6789mW
rate 1.0000 0.3089

Table10powerinformationofUofU_Digital_v1_2fromPT
baselineUofU clockgatingUofU
CellInternal 37.7mW 11.3mW
NetSwitching 10.3mW 5.110mW
CellLeakage 121.2nW 106.4nW
total 48.0000mW 16.4101mW
rate 1 0.3419

Asthetablesillustrated,forthesamelibrary,clockgatingsavesalargenumberofpowers,at
least65%reductionisachieved.

3.4 PowerComparisonandResults

Table11powercomparisonfromDC
baselineUofU clockgatingUofU baselineLIB6710
CellInternal 38.6831mW 11.4030mW 42.5486mW
NetSwitching 2.3641mW 1.2758mW 6.5842mW
CellLeakage 113.6330nW 103.6765nW 104.7330nW
total 41.0473mW 12.6789mW 54.4947mW
rate 1.0000 0.3089 1.3276

Table12powercomparisonfromDC
baselineUofU clockgatingUofU baselineLIB6710
CellInternal 37.7mW 11.3mW 35.1559mW
NetSwitching 10.3mW 5.110mW 30.2134mW
CellLeakage 121.2nW 106.4nW 116.8109nW
total 48.0000mW 16.4101mW 65.3694mW
rate 1 0.3419 1.3619
Wemakeacomparisonamongallthepowers,asTable11andTable12showed.Theresult
showsthatclockgatingversionconsumeslesspowerthantheothertwoversions,atmostfour
timesreductions.

4 AreaReport
4.1AreaofLib6710_02

Table13areaofLib6710_02
Combinational Noncombinational Total
datapath 143986 132192 276178
alucontrol 3175 0 3175
controller 17042 3888 20930
total 164203 136080 300283
Sincethesamereason,wefirstobtaineachmodule'sareainformationandthenadduptheall
theareasofeachmoduletogetthetotalarea.
4.2AreaofUofU_Digital_v1_2

Table14areaofUofU_Digital_v1_2fromDC
baselineUofU clockgatingUofU
Combinational 3259 2441
Noncombinational 2520 2663
Total 5779 5104
rate 1 0.88

4.3AreaComparisonandResults

Table14areacomparison
baselineUofU clockgatingUofU baselineLIB6710
Combinational 3259 2441 164203
Noncombinational 2520 2663 136080
Total 5779 5104 300283
rate 1 0.88 51.96
Obviously,theclockgatingversionholdstheleastarea.However,herethenumberonly
reflectsthenumberofcells.
The only reason for the increased number of cells is that UofU_Digital_v1_2 library is better
thanLib6710_02library.
VConclusionandDiscussion
OurteamhassuccessfullybuiltupfourversionsofMIPS.
IV. flattenedbaselineMIPSuP
V. clockgatedflattenedMIPSuP
VI. modularbaselineMIPSuP
VII. modularthreevoltageMIPSuP
We were able to make full analysis on comparison of clockgated and baseline MIPS processor.
However,comparisonofthreevoltageMIPSprocessorwithbaselinewasnotpossiblebecausewe
could not figure out the way to get *.sdf file from the LVSbypassed layout. (Please, read the
LVSbypassingschemeweelaboratedonImplementationsection).
5.1ClockGatedMIPSProcessor
According to our experiments, the power of MIPS has been dramatically reduced on
clockgatedMIPSprocessor.Almost65%ofthebaseline'spowerhasbeenreducedonclockgated
MIPSprocessor.
However, there were tradeoffs between the speed, power and area. From the clockgated
experiment, we got power and area reduction, however the performance went down. The
performancereductionwasabout42%.Therefore,thedelay*powermetricshows(1.74*0.342)
= 0.595 which means that the clockgated chip is about 68% efficient than the baseline
consideringsameweightstopowerandperformance.
5.2MultiVoltageMIPSProcessor
Fromthemultivoltagesupplyprospective,althoughtheperformanceofthechipremainsthe
same,theareaofthechipincreasesasapenaltyofthereductioninpower.Inessence,wehave
tomakeatradeoffbetweenarea,powerandperformancesothatitcanmeetourrequirement.
There are several potential reasons responsible for the failure of the unflattened MIPS with
multivoltagesupply.First,theVDDandGNDarenotinthesymbolsofthemodules.Soifwetry
toimplementmorethanoneVDDtolayoutofthechip,theyarenotreflectedontheschematic
ofthechip.Therefore,theschematicandthelayoutofthechiparenotgoingtomatch.Second,it
is possible that cadence allow us to have only one voltage supply. Ifwe want to supply the chip
withmultivoltages,wehavetofinishitduringtheprocessofSOC.Third,thepintypeofthepad
ring may affect the result. We have tried to set the pin type of VDD as pad_io_nores, but it
doesntwork.

5.3Conclusion
OurprojecthasimplementedabaselineMIPS(8bit)microprocessorandaclock
gatedversionofthebaseline.BasedonthetwoversionofMIPS,adetailedanalysis
wasmade.However,forMSV,whichisquiteahardmethodtoimplement.We
implementedMSVappliedMIPSprocessoruntiltheLVSstep.SomehowourCadence
tooldoesnotrecognizemultiplevoltagescorrectly.Thisdrawbackiscompensatedby
PrimeTimePXwhichcangeneratestatic
powerandtiminginformationforMSV.Asaresult,weconductallthecomparisons
andmakeadetailedanalysisforallcomparisons.

References:
[1] R. M. Secareanu and E. G. Friedman, A Universal CMOS Voltage Interface Circuit, Proceedings of the IEEE
InternationalSymposiumonCircuitsandSystems,pp.12421245,May1999
[2]GustavoE.Tellez,AmirFarrahi,MajidSarrafzadehActivityDrivenClockDesignforLowPowerCircuits
[3] Kursun V, Secareanu RM, Friedman EG. (May 2002). CMOS voltage interface circuit for low power systems.
ProceedingsoftheIEEEInternationalSymposiumonCircuitsandSystems,Vol.3,pp.667670.
[4]ChandrakasanAP,BrodersenRW.LowPowerCMOSDigitalDesign.KluwerAcademic:Norwell,MA,1995.
[5] Song Liu and R. Jacob Baker, Microelectronics Research Center: Process and Temperature Performance of a
CMOSBetaMultiplierVoltageReference1999IEEE

S-ar putea să vă placă și