Sunteți pe pagina 1din 15

PAPER PRESENTED ON:

DISTRIBUTED LOW POWER EMBEDDED SYSTEM

ABSTRACT
A multiple-processor system can potentially achieve higher energy savings than a single processor, because Nthe reduced workload on each processor creates new opportunities for dynamic voltage scaling (DVS ! "owever, as the cost of communication starts to match or surpass that of computation, many new challenges arise in making DVS effective in a distributed system under communication intensive workload! #his paper discusses implementation issues for supporting DVS on distributed embedded processors! $e implemented and evaluated four distributed schemes% (& DVS during '(), (* partitioning, (+ powerfailure recovery, and (, node rotation! $e validated the results on a distributed embedded system with the 'tsy pocket computers connected by serial links! )ur e-periments confirmed that a distributed system can create new DVS opportunities and achieve further energy savings! "owever,a surprising result was that aggregate energy savings do not translate directly into a longer battery life! 'n fact, the best partitioning scheme, which distributed the workload onto two nodes and enabled the most power-efficient ./0 speeds at +12314, resulted in only &34 improvement in battery lifetime! )f the four techni5ues evaluated, node rotation showed the most measurable improvement to battery lifetime at ,34 by balancing the discharge rates among the nodes!

1 Introduction Dynamic voltage scaling (DVS) is one of the most studied topics in low-power embedded systems. Based on C !S characteristics" the power consumption is proportional to V#$ while the supply voltage V is linearly proportional to the cloc% fre&uency. 'o fully e(ploit such &uadratic power vs. voltage scaling effects" previous studies have e(tensively e(plored DVS with real-time and non-real-time scheduling techni&ues. )s DVS reaches its limit on a single processor" researchers turn to multiple processors to create additional opportunities for DVS. ultiple processors can potentially achieve higher energy savings than a single processor. By partitioning the wor%load onto multiple processors" each processor is now responsible for only a fraction of the wor%load and can operate at a lower voltage*fr&uency level with &uadratic power saving. eanwhile" the lost performance can be compensated by the increased parallelism. )nother advantage with a distributed scheme is that heterogeneous hardware such as DS+ and other accelerators can further improve power effi- ciency of various stages of the computation through speciali,ation. )lthough a tightly-coupled" shared-memory multiprocessor architecture may have more power*performance advantages" they are not as scalable as distributed" messagepassing schemes. -hile distributed systems have many attractive properties" they pay a higher price for message-passing communications. .ach node now must handle not only /*! with the e(ternal world" but also /*! on the internal networ%. +rogramming for distributed systems is also inherently more difficult than for single processors. )lthough higher-level abstractions have been proposed to facilitate distributed programming" these abstraction layers generate even more inter-processor communication traffic behind the scenes. -hile this may be appropriate for highperformance cluster computers with multi-tier" multi-gigabit switches li%e yrinet or 0igabit .thernet" such high-speed" high-power communication media are not realistic for battery-powered embedded systems. /nstead" the low-power re&uirement have constrained the communication interfaces to much slower" often serial interfaces such as /#C and C)1. )s a result" even if the actual data wor%load is not large on an absolute scale" it appears e(pensive relatively to the computation performance that can be delivered by today2s lowpower embedded microprocessors. 'he effect of /*! on embedded systems has not been well studied in e(isting DVS wor%s. any e(isting DVS techni&ues have shown impressive power savings on a single processor. 3owever" few results have been fully &ualified in the conte(t of an entire system. .ven fewer have been validated on actual hardware. !ne common simplifying assumption is to ignore /*!. 2 Related Work 4eal-time scheduling has been e(tended to DVS scheduling on variable-voltage processors. 'he initial scheduling model was introduced by 5ao et al 6789" then e(tended and refined by 5asuura 6:9" ;uan 6<9 and many studies in variations for real-time scheduling problems. Since power is a &uadratic function of the supplying voltage" lowering the voltage can result in significant savings while still enabling the processor to continue ma%ing progress such that the tas%s can be completed before their deadlines. 'hese techni&ues often focus on the energy reduction to the processor only$ while the power consumption of other components" including memory" /*!" is ignored. 'he results are rarely validated on real hardware. DVS has been applied to benchmar% applications such as =+.0 and +.0 in embedded systems. /m et al 6>9 proposes to buffer the incoming tas%s such that the idle period between tas% arrivals can be utili,ed by DVS.

Shin et al 6?9 introduces an intra-tas% DVS scheme that ma(imally utili,es the slac% time within one tas%. Choi et al 6#9 presents a DVS scheme in an +.0 decoder by ta%ing advantage of the different types of frames in the +.0 stream. 'hese techni&ues can be validated with measurement on real or emulated platforms. 3owever" they are also computationoriented such that the processor performs only very little" if any /*!. 'he impact of /*! still remains under-studied. DVS has recently been e(tended to multiprocessor systems. -eglar, 6@9 proposes partitioning the computation onto a multiprocessor architecture that consumes significantly less power than a single processor. 'here is a fundamental difference in applying techni&ues for multiprocessors to distributed systems. inimi,ing the global energy consumption will e(tend the battery life only if the whole system is assumed to be powered by a single battery unit. /n a distributed environment" each node is power by a dedicated battery. .ven a globally optimal solution may cause poor battery efficiency locally and result in shortened system uptime as well as loss of battery capacities. ale%iet al 6A9 analy,es the energy efficiency of routing protocols in an ad-hoc networ% and shows that the global optimalschemes often contradicts the goal to e(tend the lifetime of a distributed system. 3 Motivating Example -e select an image processing algorithm" automatic target recognition ()'4) as our motivating e(ample to evaluate a realistic application under /*! pressure. /ts bloc% diagram is shown in Big. 7. 'he algorithm is able to detect pre-defined targets on an input image. Bor each target" a region of interest is e(tracted and filtered by templates. Binally" the distance of each target is computed. ) throughput constraint is imposed such that the image frames must be processed at a fi(ed rate. -e evaluate a few DVS schemes on one or more embedded processors that perform the )'4 algorithm. Cnli%e many DVS studies that ignore /*!" we assume that all nodes in our system are connected to a communication networ%. 'his networ% carries data from e(ternal sources (e.g." a camera" sensor" etc.) internal communications between the nodes" and to an e(ternal destination (e.g." a +C). 'his study assumes only one image and one target are processed at a time" although a multi-frame" multi-target version of the algorithm is also available.-e refer to each embedded

processor as a node. ) node is a full-fledged computer system with a voltage-scalable processor" /*! devices" and memory. .ach node performs a computation tas% /6). and two communication tas%s 67.V and S7ND. 67.V receives data from the e(ternal source or another node. 'he data is processed by /6)." which consists of one or more functional bloc%s of the )'4 algorithm. 'he result is transmitted by tas% S7ND to another node or the destination. Due to data dependencies" tas%s 67.V" /6)." and S7ND must be fully seriali,ed for each node. /n addition" they must complete within a time period called the frame delay D" which is defined as the performance constraint. Big. # illustrates the timing vs. power diagram of a node.

-hen multiple nodes are configured as a distributed system" we organi,e them in a pipeline for the )'4 algorithm. Big. D shows the timing vs. power diagram of two pipelined nodes performing the )'4 algorithm. Node7 maps first two function bloc%s" and Node# performs the other two bloc%s. Node7 receives one frame from the data source" and it processes the data and sends the intermediate result to Node# in D seconds. )fter Node# starts receiving from Node7" it finishes its share of computation and sends the final result to the destination within D seconds. Big. D shows that if the data source %eeps producing one frame every D seconds"and both Node7 and Node# can send their results also in D seconds" then the distributed pipeline is able to provide one result in every D seconds to the destination.

Experimental !lat"orm -e use the /tsy poc%et computers as distributed nodes" connected by a 'C+*/+ networ% over serial lin%s. -e present the performance and power profiles of the )'4 algorithm running on /tsy computers and define the metrics for our e(periments.
#1 T$e It%& !ocket Computer

'he /tsy poc%et computer is a full-fledged miniaturi,ed computer system developed by Compa& -estern Digital Eab 67" D9. /t supports DVS on the Strong)4 S)-7788 processor with 77 fre&uency levels from A@ F #8:.> 3, over >D different voltage levels. /tsy also has D# B flash memory for off-line storage and D# B D4) as the main memory and a 4) dis%. 'he power supply is a >V lithium-ion battery pac%. Due to the power density constraint of the battery" /tsy currently does not support highspeed /*! such as .thernet or CSB. 'he applicable /*! ports are a serial port and an infra-red port. /tsy runs Einu( withnetwor%ing support. /ts bloc% diagram is shown in Big. >.
#2 'et(ork Con"iguration

-e currently use the serial port as the networ% interface. -e set up a separate host computer as both the e(ternal source and destination. /t connects the /tsy nodes through multiple serial ports established by CSB*serial adaptors. -e setup individual +++ (pointto-point protocol) connections between each /tsy node and the host computer. 'herefore the host computer acts as the hub for multiple +++ networ%s" and it assigns a uni&ue /+ address to each /tsy node. Binally" we start the /+ forwarding service on the host computer to allow /tsy nodes to communicate with each other.

transparently as if they were on the same 'C+*/+ networ%. 'he networ% configuration is shown in Big. A. 'he serial lin% might not be the best choice for interconnect" but it is often used in real life due to power constraints. ) high-speed networ% interface re&uires several -atts of power" which is too high for battery-powered embedded systems such as /tsy. /n this paper" our primary goal is to investigate the new opportunities for DVS-enabled power vs. performance trade-offs in distributed embedded systems with intensive /*!. 0iven the limitations of serial ports" we do not intend to propose our e(perimental platform as a prototype of a new distributed networ% architecture. -e chose this networ% platform primarily because it represents the state of the art in power management capabilities. /t is also consistent with the relatively e(pensive communication" both in terms of time and energy" seen by such systems. -e e(pect that our findings in this paper can be applied to many communication-intensive applications on other networ% architectures" where communication is a %ey factor for both performance and power management.
#3 !er"ormance !ro"ile o" t$e ATR Algorit$m

.ach single iteration of the entire )'4 algorithm ta%es 7.7 seconds to complete on one /tsy node running at the pea% cloc% rate of #8:.> 3,. -hen the cloc% rate is reduced" the performance degrades linearly with the cloc% rate. 'he +++ connection on the serial port has a ma(imum data rate of 77A.# Gbps" though our measured data rate is roughly ?8 Gbps. /n addition" the startup time for establishing a single communication transaction ta%es A8F788 ms. 'he computation and communication behaviors are profiled and summari,ed in Big. :. 'he functional bloc%s can be all combined into one node or distributed onto multiple nodes

in a pipeline. /n the single node case there are no communications between adHacent nodes" although the node still has to communicate with the host computer.
# !o(er !ro"ile o" t$e ATR Algorit$m

Big. : shows the net current draw of one /tsy node. 'he nhori,ontal a(is represents the fre&uency and corresponding voltage levels. 'he data are collected by /tsy2s built-in power monitor. During all e(periments the ECD screen and the spea%er are turned off to reduce unnecessary power consumption. 'he e(ecution of the )'4 algorithm on /tsy has three modes of operationsI idle" communication and computation. /n idle mode" the /tsy node has neither /*! nor any computation wor%load. /n communication mode" it is either sending or receiving data through the serial port. /n computation mode" it e(ecutes the )'4 algorithm. Big. : shows the three curves range from D8 m) to 7D8 m)" indicating a power range from 8.7- to 8.A-. 'he computation always dominates the power consumption. 3owever" due to the slow data rate of the serial port" communication tas%s have long delays thus consume a significant amount of energy" although the communication power level is not the highest. )s a result" /*! energy becomes a primary target to optimi,e in addition to DVS on computation.
#) Metric%

-e evaluate several DVS techni&ues by a series of e(periments with one or more /tsy nodes. ) baseline configuration is a single /tsy node running the entire )'4 algorithm at the highest cloc% rate. /t is able to produce one result in every D seconds. Bor all e(periments" we fi( this frame delay D as the performance constraint and %eep the /tsy node(s) running until the battery is fully discharged. 'he energy metric can be measured by the battery life #(N) when N nodes with N batteries are being used. 'he completed workload 8(N) is the number of frames completed before the battery e(haustion. 'he

battery life in the baseline configuration is #(7). Since the frame delay D is fi(ed" the host computer will transmit one frame to the first /tsy node in every D seconds. ) Tec$ni*ue% under Evaluation -e first define the baseline configuration as a reference to compare e(perimental results. -e briefly review the DVS techni&ues to be evaluated by our e(periments.
)#1 Ba%eline Con"iguration

'he baseline configuration is a single /tsy node performing the entire )'4 algorithm. /t operates at the highest C+C cloc% rate of #8:.> 3,. 'he processing tas% /6). re&uires 7.7 seconds to complete. 'he node also needs 7.7 and

8.7 seconds to receive and send data" respectively. 'herefore the total time to process one frame is D = #.D seconds. Based on the metrics we defined in Section >.A" we fi( this frame delay D = #.D seconds in all e(periments.
)#2 +,S during I-.

'he first techni&ue is to perform DVS during the /*! period. Since the application is tightly constrained on timing with e(pensive /*! delay" there is not much opportunity for DVS on computation without a performance penalty. !n the other hand" since the /tsy node spends a long time on communication"it is possible to apply DVS during /*!. Based on the power characteristics shown in Big. <" /*! can operate at a significantly low-power level at the slowest fre&uency of A@ 3,.
)#3 +i%tri/uted +,S /& !artitioning

+artitioning the algorithm onto multiple nodes can create more time per slot for DVS on each distributed node. 3owever" since the application is already /*!-bound" additional communication between nodes can further increase the /*! pressure. ) few concerns

must be ta%ing into account to correctly balance computation and /*!. Birst" each node must be able to complete its tas%s 67.V" /6)." and S7ND within D = #.D seconds. -ith an unbalanced partitioning" a node can be overloaded with either e(cessive /*! or heavy computation" such that it cannot finish its wor% on time and then the whole pipeline will fail to meet the performance constraint. Second" additional communication can potentially saturate the networ% such that none of the nodes can guarantee to finish their wor%load on time. Binally" the distributed system should deliver an e(tended battery life in the normali,ed term" not Hust a longer absolute uptime. -e e(periment with two /tsy nodes" although the results do generali,e to more nodes. Based on the bloc% diagram in Big. :" three partitioning schemes are available and illustrated in Big. ?. 'he first scheme" where Node7 is only responsible for target detection and Node# performs the remaining three functional bloc%s" is clearly the best among all three solutions. Due to the least amount of /*!" both nodes are allowed to run at much lower cloc% rates.
)# +i%tri/uted+,S (it$ !o(er 0ailure Recover&

/n general" it is impossible to evenly distribute the wor%load to each node in a distributed system. /n many cases even the optimal partitioning scheme yields very unbalanced wor%load distribution. /n our e(periments" Node# with more wor%load will have to run faster thus its battery will e(haust sooner. )fter one node fails" the distributed pipeline will simply stall although the remaining nodes still have sufficient battery capacity to %eep wor%ing. 'his will result in unnecessary loss of battery capacity. !ne potential solution is to recover from the power failure on one node by detecting the faulting node dynamically and migrating its computation to neighboring nodes. Such techni&ues normally re&uire additional control messages between nodes" thereby increasing /*! pressure on the already /*!-bound applications. Since these messages will also cost time" they will force an increase of computation speed such that the node will fail even sooner. )s a proof of concept" we implement a fault recovery scheme as follows. .ach sending transaction must be ac%nowledged by the receiver. ) timeout mechanism is used on each node to detect the failure of the neighboring nodes. 'he computation share of the failed node will then migrate to one of its neighboring nodes. 'he message reporting a faulting node can be encapsulated into the sending data stream and the ac%nowledgment. 'herefore" the information can be propagated to all nodes in the system. )s mentioned in Section >.D" the ac%nowledgment signal re&uires a separate transaction" which typically costs A8F788 ms in addition to the e(tended /*! delay. Since the frame delay D is fi(ed" the processor must run faster to meet the timing constraint due to the increased /*! delay to support the power failure recovery mechanism.
)#) +i%tri/uted +,S (it$ 'ode Rotation

)s an alternative to the power failure recovery scheme in Section A.>" we balance the load on each node more ef- ficiently with a new techni&ue. /f all nodes are evenly balanced" after the first battery fails" then the other batteries will also e(haust shortly. 1 Experimental Re%ult% -e evaluate the DVS techni&ues described in Section A by e(periments then analy,e the results in the conte(t of a distributed" /*!-bound system.
1#1 28A3 894 Initial Evaluation (it$out I-.

Before e(perimenting DVS with /*!" we first perform two simple e(periments on a single /tsy node to e(plore the potential of DVS without /*!. 'he single /tsy node reads local copies of the raw images and it only computes the results" instead of receiving images from the host and sending the results bac%. 'herefore there is no communication delay or

energy consumption involved. (8A)I -e use one /tsy node to %eep running the entire )'4 algorithm at the full speed #8:.> 3,. /ts battery will e(haust in D.> hours with 77.AG frames completed. (89)I -e setup the second /tsy node to e(ecute at the half speed 78D.# 3,. 'hen it is able to continue operating for 7#.@ hours by finishing ##.AG frames. )t the half cloc% rate" the /tsy computer can complete twice wor%load as much as it can do at the full speed. -e overload the metrics notation we defined in Section >.A as followsI # maps the e(periment label to the total battery life" and 8 maps the e(periment label to the number of frames processed. 3ere" #(8A) = D.> (hours)" 8(8A) = 77A88. #(8A) = 7#.@" 8(89) = ##A88. 1ote that these results are not to be compared with other e(periments" since there is no communication and no performance constraint. 'he results are promising for having more nodes as a distributed system. By using two /tsy nodes running at the half speed" the system should be able to deliver the same performance as one /tsy node at the full speed does" while completing four times the wor%load by using two batteries. 3owever" such an upperbound can only be achieved without the presence of /*!.
1#2 214 Ba%eline con"iguration

-e defined the baseline configuration in Section A.7. 'he single /tsy node running at #8:.> 3, can last for :.7D hours and finish @.:G frames before the battery dies. 'hat is #(7) =#norm(7)= :.7D" 8(7) =@:88" 6norm(7)= 788J. Compared with e(periment (8A) without /*!" the completedwor%load is 7<J less since the node must spend a long time during /*!.

1#3 27A4 +,S during I-.

)s Section A.# suggests" we apply DVS to /*! periods" such that during sending and receiving the /tsy node operatesn at A@ 3," while in computation it still runs at #8:.> 3,. Brom our measurement communication delay does not increase at a lower cloc% rate. 'hus the performance remains the same as D = #.D seconds. 'hrough DVS during /*!" the battery life is e(tended to <.: hours and it is able to finish 77.@G frames. 'hat is #(7A) = #norm(7A) = <.:"8(7A) = 77@88" 6norm(7A) = 7#>J" indicating a #>J increase in battery life. 1ote that 8(7A) > 8(8A) = 77A88. .ven though the /tsy node is communicating a large amount of data with the host computer" it completes more wor%load than it does in e(periment (8A) without /*!. 'his is due to the recovery effect of batteries.
1# 224 +i%tri/uted +,S /& !artitioning

Since there is no further opportunities for DVS with the single node" from now we evaluate distributed configurations with two /tsy nodes in a pipeline. /n Section A.D we selected the best partitioning scheme" in which two /tsy nodes operate at A@ 3, and 78D.# 3," respectively. 'he distributed two-node pipeline is able to complete ##.7G frames in 7>.7 hours. 'hat is" #(#) = 7>.7" 8(#) = ##788. Compared to e(periment (7)" the battery life is more than doubled. 3owever" after normali,ing the results for two batteries"#norm(#)=<.8A" 6norm(#)=77AJ" meaning the battery life is only effectively e(tended by 7AJ. Distributed DVS is even less efficient than (7 A)" in which DVS during /*! can e(tend #>J of the battery capacity.'here are a few reasons behind the results. Birst" when Node# fails" the pipeline simply stalls while plenty of energy still remains on the battery of Node7. Second" Node# always fails first because the wor%load on the two nodes is not balanced very well. Node# has much more computation load and it has to run at 78D.# 3,$ while Node7 has very little computation such that it operates at A@ 3,. 3owever" this partitioning scheme has already been optimal with the ma(imally balanced load. /f we choose other partitioning schemes" the system will fail even sooner as analy,ed in Section A.D.
1#) 2#A4 +i%tri/uted +,S during I-.

DVS during /*! (7A) can e(tend #>J battery life for a single node. -e e(pect the distributed pipeline can also benefit from the same techni&ue by applying DVS during /*! for distributed nodes. )mong the two /tsy nodes" Node7 is already configured to the lowest cloc% rate. 'herefore"we can only reduce the cloc% rate of Node# to A@ 3, during its /*! period and leave it at 78D.# 3, for computation. 'he result is #(#A) = 7>.>>" 8(#A) = ##:88" #norm(#A) = <.## and 6norm(#A) = 77?J. !nly DJ more battery capacity is observed comparing with e(periment (#). Distributed DVS during /*! is not as effective as DVS during /*! for a single node. )ccording to the power profile in Big. <" from (7) to (7A) the discharge current drops from 778 m) to >8 m) during /*! periods" which ta%e the half of the e(ecution time of the single node. 3owever" from (#) to (#A)" we only optimi,e for Node# that has already operated at a low-power level during /*! (AA m)). By DVS during its /*! periods" the discharge current decreases to >8 m). 'hus" the 7A m) reduction is not as considerable compared with the <8 m) saving in e(periment (7A). /n addition"Node# does not spend a long time during /*!. /t only communicates <88 Bytes in very short periods. 'herefore" the small reduction to a small portion of power use contributes trivially to the system. !n the other hand" Node7 has heavy /*! load. 3owever" since it runs at the lowest power level" there is no chance to

further optimi,e its /*! power. Brom e(periments (#) and (# A) we learn a few lessons. )lthough there are more distributed DVS opportunities whereas not available on a single processor" the energy saving is no longer decided merely by the processor speed. /n a single processor" minimi,ing energy directly optimi,es the life time of its single battery. 3owever in a distributed system" batteries are also distributed. inimi,ing global energy does not guarantee to e(tend the lifetime for all batteries. /n our e(periments" the load pattern of both communication and computation decides the shortest battery life" which often determines the uptime of the whole system.
1#1 2#94 +i%tri/uted +,S (it$ !o(er 0ailure Recover&

/n e(periments (#) and (#A)" the whole distributed system fails after Node# fails" although Node7 is still capable of carrying on the entire algorithm. -e attempt to enable the system to detect the failure of Node# and reconfigure the remaining Node7 to continue operating. !ur approach is described in Section A.>. -e use the same partitioning scheme in (#) and (#A). Due to the additional communication transactions for control messages" both nodes have to run faster. )s a result" Node7 must operate at <D.< 3," and Node# at 77? 3,. -e also perform DVS during /*! for both nodes. 'he result is" #(#9) = 7A.<#" 8(#9) = #>A88" #norm(#9) = <.?: and 6norm(#9) = 7#?J. -ith our recovery scheme" the system can last longer than (#) and (# A). 3owever there is no significant improvement compared to the simple DVS during /*! scheme(7 A). Since both nodes must run faster" Node# will fail more &uic%ly after completing 7@.AG frames and Node7 can pic% up another AG frames until all batteries have e(hausted. +ower failure recovery allows the system to continue functioning with failed nodes. 3owever it is e(pensive in a sense that it must be supported with additional" e(pensive energy consumption.
1#5 2#.4 +i%tri/uted +,S (it$ 'ode Rotation

Cp to now the distributed DVS approaches do not seem effective enough. /n e(periment (#) and (#A)" the failure of Node# shuts down the whole system. .(periment (# 9) allows the remaining Node7 to continue. 3owever the power failure recovery scheme also consumes energy before it can save energy. -hat prevents a longer battery life is the unbalanced load between Node7 and Node#. /n this new e(periment we implemented our node rotation techni&ue presented in Section A.A" combined with DVS during /*!. Since there is no performance penalty" two nodes can still operate at at A@ 3, and 78D.# 3,. By node rotation in every 788 frames" the battery life can be e(tended to #(#.) = 7<.?#"8(#.) = #<@88" #norm(#.) = ?.@7 and 6norm(#.) = 7>AJ.'his is the best result among all techni&ues we have evaluated.1ode rotation allows the wor%load to be evenly distributed over the networ% thus ma(imally utili,es the distributed battery capacity. 'here is also an additional benefit.Since both nodes alternate their fre&uency between 78D.# 3, and A@ 3," both batteries can ta%e advantage of the recovery effect to further e(tend their capacity. 'o summari,e" our e(perimental results are presented in Big. 78. Both absolute and normali,ed battery lives are illustrated"with normali,ed ratios annotated. 'he results of e(periments (8 A) and (89) without communication are not included since it is not proper to compare them with /*!bound results. /t should be noted that the effectiveness of these techni&ues is applicationdependent. )lthough e(periment(#) and (#A) do not show much improvement in this case study" the corresponding techni&ues can still be effective to other applications.

5 Conclu%ion 'his paper evaluates DVS techni&ues for distributed lowpower embedded systems. DVS has been suggested an effective techni&ue for energy reduction in a single processor. )s DVS opportunities diminish in communication-bound"

time-constrained applications" a distributed system can e(pose richer parallelism that allows further optimi,ation for both performance and DVS opportunities. 3owever" the designers must be aware of many tric%y and often counterintuitive issues" such as additional /*!" partitioning" power failure recovery and load balancing" as indicated by our study. -e presented a case study of a distributed embedded application under various DVS techni&ues. -e performed a series of e(periments and measurements on actual hardware with DVS under /*!-intensive wor%load" which is typically ignored by many DVS studies. -e also proposed a new load balancing techni&ue that enables more aggressive distributed DVS that ma(imi,es the uptime of battery powered"distributed embedded systems.
4.B4.1C.SI
6#9 G. Choi" G. Dantu" -.-C. Cheng" and . +edram. Bramebased dynamic voltage and fre&uency scaling for a +.0 decoder. /n /roc! 'nternational .onference on .omputerAided Design" pages <D#F<D<" 1ovember #88#. www.google.com www.ieee.org www.ieee*spectrumonline.com

S-ar putea să vă placă și