Energy Efficient

Published on January 2017 | Categories: Documents | Downloads: 59 | Comments: 0 | Views: 568
of 4
Download PDF   Embed   Report

Comments

Content

Energy-Efficient Adaptive Clocking Dual Edge Sense-Amplifier Flip-Flop
Yen-Ting Liu, Lih-Yih Chiou, and Soon-Jyh Chang
Department of Electrical Engineering, National Cheng Kung University Tainan 70101, Taiwan, Republic of China Email: [email protected] and [email protected] Transmission Gate Latch-Mux (TGLM) [2], C2MOS Latch-Mux [3], and SSTC-C2MOS [4]. They put positive and negative flip-flops in parallel to perform dual edge triggering. These structures are straightforward. However, the internal nodes are charging and discharging at every clock cycle regardless of the input even when they are sampled at the same value. This wastes a lot of power. Examples of pulse-triggered DETFFs are explicit-pulsed dual edge-triggered static hybrid flip-flop (ep-DSFF) [5], and dual-edge conditional pre-charge flip-flop (DE-CPFF) [6]. Pulse triggered flip-flops have the soft clock edge property that can absorb clock skew. However, they have one common problem that power consumed by clocked transistors accounts for a major part of total power. To reduce power consumption when data switching activity is low, we extend the conditional capturing technique proposed in [7]. The first proposed pulse-triggered flip-flop is conditional capturing dual edge sense-amplifier flip-flop (CDSAFF). To further overcome the common problem of pulse-triggered DETFFs, we present another novel design called adaptive clocking dual edge sense-amplifier flip-flop (ACSAFF). ACSAFF has the advantage of conditional capturing and can disable some internal clocked transistors when data switching activity is low. The rest of the paper is organized as follows. Section II presents and explains the operation of the proposed flip-flops. We describe the simulation testbench, show experimental results and discussions in section III. Finally, section IV draws the conclusion. II. PROPOSED DETFFS We propose a conditional capturing dual edge sense-amplifier flip-flop (CDSAFF) as shown in Fig. 1. There are three major parts for the proposed circuit: 1) the inverter chain to generate delayed clocked signals; 2) the front-end core to sample data that is based upon sense-amplifier flip-flop (SAFF); 3) the SR latch consisted of the two cross-coupled NAND gates. In order to achieve dual edge-triggering, we use an inverter chain to produce delayed clock signals. CLK and CLK3 both are high for a short period of time on the CLK rising edge and so are CLK1 and CLK4 on the CLK falling edge. Fig. 2 illustrates the timing relationship of the delayed clocked signals. The shadowed regions

Abstract— In this paper, we propose two novel dual edge-triggered flip-flops. One design eliminates redundant transitions of internal nodes when current data is the same as the previous one. This has the least power delay product compared to other dual edge-triggered flip-flops in all range of possible data switching activity and its delay is also the smallest. The other proposed flip-flop disables internal clocked transistors. When data switching activity is within 20%, it has the least power consumption.

I. INTRODUCTION In deep sub-micron technology, the magnitude of power density becomes substantial and the coincident problems of heat removal and cooling are worsening. Low power gradually becomes one of the most important design considerations. In very large scale integration system (VLSI), clock network is one of the most power consuming components. It accounts for 20% to 45% of the total power dissipation in a single chip [1]. The total power dissipation of the clock network is given by
2 Pclk = VDD f clk (Cclk + C ff ,clk ) + f data × C ff , data ,

[

]

(1)

where VDD is the supply voltage, fclk the clock frequency, fdata the average data frequency, Cclk the total capacitance related to the clock signal excluding flip-flops, Cff,clk the total capacitance related to the clock signal connected to the flip-flops, and Cff,data the total capacitance of the flip-flops connected to the data input. There are several ways to reduce clock power. The most influential way is VDD scaling, which has quadratic impact on Pclk. However, VDD has already been reduced along with downscaling of process. The capacitance is unlikely to decrease as long as the number of transistors in a circuit becomes larger and functionality is more complex. One effective way to reduce fclk without performance degradation is to use dual edge-triggered flip-flops (DETFFs). The DETFF requires only half of fclk to maintain the same throughput as single edge-triggered flip-flop (SETFF). Two main categories of DETFFs are master slave and pulse triggered. Examples of master-slave DETFFs include
This work was supported in part by the National Science Council, Taiwan, under grant NSC94-2213-E006-033.

0-7803-9390-2/06/$20.00 ©2006 IEEE

4329

ISCAS 2006

VDD M7 SB D QB M6 M5 M9 M8 M4 M3 M10 RB DB Q
SB Q
NC MC1 VDD NC

VDD M C3 M C2 CLK3 N1 CLK4

D

DB
CLK CLK1 CLK2

NC

CLK3 CLK

M2 M1

CLK4 CLK1

RB
(b)

QB
Fig. 3. Inverter chain of ACSAFF (NC is from the front end).

D
(a)

CLK CLK1

CLK2 CLK3
(c)

CLK4

CLK CLK3 CLK1 CLK4 Q

Fig. 1. Conditional capturing dual edge sense-amplifier flip-flop (CDSAFF) (a) front end, (b) two cross-coupled NAND latch, (c) inverter chain.

Sampling Window CLK CLK3 CLK1 CLK4
Fig. 2. Sampling window by delayed clocked signals.

Fig. 4. Timing diagrams of ACSAFF.

represent the sampling window. Consequently, CDSAFF only samples data on rising and falling edges of CLK. For the front-end core, we add two cascoding transistors M5 and M8 to implement conditional capturing. If the D input is the same as the Q output, nodes SB and RB stay high and the NAND cross-coupled SR latch retains its previous outputs. If the D input is different from the Q output, the outputs of the front end are fed into a SR latch consisting of two cross-coupled NAND gates. They convert the falling pulsed SB or RB signals to static Q outputs. We utilize the fast NAND gates [8]. Although it uses more transistors, the response speed is faster. Although we can reduce power consumption by using conditional capturing, the inverter chain still consumes power even when the data switching activity is low. To further decrease the power, we propose a technique called adaptive

clocking. The adaptive clocking dual edge sense-amplifier flip-flop (ACSAFF) consists of three components as CDSAFF, but differs in the structure of inverter chain as shown in Fig. 3. We use the signal derived from node NC of the front end to control adaptive clocking. If the D input is different from the Q output, node NC is charged. When node NC turns on transistors MC1 and MC2, CLK will feed to the inverter chain to produce delayed clocked signals. Node SB or RB forms a discharging path on rising or falling edges of CLK. Once the outputs changes, transistors M5, M6, M8, and M9 will stop the charging paths to node NC. At the same time, node NC is pulled down to the ground through either M1 and M2, or M3 and M4. As a result, transistors MC1 and MC2 are turned off and the inverter chain will be disabled. Also, we add PMOS, MC3, to control nodes N1. Whereas node NC is low, we pull down CLK4 to the ground. Because we turn off the transistor MC2, once CLK3 is pulled down, it will not be influenced by CLK. We pull down both CLK3 and CLK4 to preclude the front end from sampling data not on rising or falling edges of CLK. Without this mechanism, either CLK3 or CLK4 will stay high. Whenever the D input is

4330

DIN
CLKIN

3 1.5 6 3

3 1.5 6 3

900

D CLK
1 0.5 1 0.5 1 ×14 0.5

Q

800

D-Q delay (ps)

QB ×14 ×56

700 600 500 400 300 200 100 -150

ACSAFF CDSAFF SSTC-C2MOS TGLM C2MOS Latch-Mux

×56 1

0.5

Fig. 5. Simulation testbench. TABLE I POST-LAYOUT POWER COMPARISONS (α = 25%) Internal Data Clock Total power power power power (μW) (μW) (μW) (μW) ACSAFF 62.7 2.7 6.4 71.8 CDSAFF 77.3 2.7 5.5 85.5 SSTC-C2MOS 85.9 3.5 20.7 110.1 TGLM 53.8 2.4 13.2 69.4 C2MOS Latch-Mux 56.8 3.7 11.6 72.1 TABLE II POST-LAYOUT PERFORMANCE COMPARISONS (α = 25%) Setup time (ps) tDQ (ps) PDP (fJ) ACSAFF 81.9 308.3 22.1 CDSAFF -52.8 165.9 14.2 2 SSTC-C MOS 77.7 340.6 37.5 TGLM 129.5 349.7 24.3 C2MOS Latch-Mux 63.8 331.9 23.9

-100

-50

0

50

100

150

200

250

D-CLK delay (ps)
Fig. 6. D -Q delay as a function of D -CLK delay.

changed, a discharging path may exist. Fig. 4 shows the timing diagrams of ACSAFF. We can see that CLK3 or CLK4 is only active when the D input is different from the Q output. When designing the adaptive clocking, we do not directly pull down CLK3 and CLK4 through NMOS, because there will be a short current path through PMOS of inverter and NMOS controlled by the inverted signal of NC. III. EXPERIMENTAL RESULTS A. Simulation Testbench We adopt the testbench from [9] as shown in Fig. 5 to compare the power consumption and performance of the proposed CDSAFF and ACSAFF with other DETFFs in [2-4]. The outputs of each flip-flop are loaded with 14 inverters since high performance flip-flops are typically placed in critical paths with a relatively high average load. All flip-flops are optimized by using logical effort to obtain minimum power delay product. We measure the average power at clock frequency of 250 MHz with the rise and fall time of the primary input and clock equal to 100 ps. We use TSMC 0.18 µm Mixed Signal SALICIDE (1P6M) technology with 1.8 V power supply voltage at 25 °C. All simulations are done by using HSPICE on circuits extracted from the layout, including parasitic capacitance and resistance.

B. Experimental Results Table I and II show the simulation results with 25% data switching activity (α). Internal power includes the power dissipation for switching the internal nodes of the flip-flop and driving the output load. Data and clock power are the power dissipation that drive input and clock capacitance. Total power is the sum of internal, data, and clock power. Table II shows the timing parameters and power delay product (PDP). tDQ stands for the minimum D to Q delay. We use PDP (total power × tDQ) as our overall performance metric. TGLM consumes the least power, yet its delay is quite significant. The proposed ACSAFF consumes the second-smallest power. Although CDSAFF consumes large power, its delay is the smallest. The PDP of CDSAFF is the smallest comparing with others. The setup time of CDSAFF is negative, because the sampling window is still open for a short period of time after CLK is high. From Fig. 4, the sampling window of ACSAFF has the same feature as CDSAFF, so the setup time of ACSAFF should be negative, too. Yet, we apply adaptive clocking. The D input has to be stable to turn on the inverter chain, which increases the setup time. Fig. 6 shows D-Q delay as a function of D-CLK delay. We can see that the D-Q delay characteristic of ACSAFF is very flat. The variation of D-CLK delay does not cause large disturbance of D-Q delay. ACSAFF has good ability of clock uncertainty absorption. Fig. 7 and Fig. 8 show power dissipation and PDP for different data switching activities. The proposed ACSAFF consumes the least power when α is within 20%, because there is no redundant charging or discharging of internal nodes. When α is 25%, ACSAFF still has good performance. Its power consumption is larger than TGLM by only 3.5%. CDSAFF consumes the least energy through different α. When α equals to 25%, PDP of CDSAFF is 20% less than TGLM, 18% than C2MOS Latch-Mux, and 48% than SSTC-C2MOS.

4331

350

45

Power consumption (µW)

300 250 200 150 100 50 0 0

Power delay product (fJ)

ACSAFF CDSAFF SSTC-C2MOS TGLM C2MOS Latch-Mux

40 35 30 25 20 15 10 1.3

ACSAFF CDSAFF SSTC C2MOS TGLM C2MOS Latch-Mux

20

40

60

80

100

Data switching activity (%)
Fig. 7. Power consumption for different data switching activity.

1.4

1.5

1.6

1.7

1.8

1.9

2

Supply voltage (V)
Fig. 9. Power delay product as a function of supply voltage (α = 25%).

350

IV. CONCLUSION
ACSAFF CDSAFF SSTC-C2MOS TGLM C2MOS Latch-Mux

Power consumption (µW)

300 250 200 150 100 50 0 0

This paper presents two energy-efficient dual edge-triggered flip-flops. CDSAFF implements conditional capturing to reduce unnecessary charging and discharging of internal nodes. ACSAFF applies adaptive clocking to cease inverter chain to generate delayed clocked signals. ACSAFF consumes the least power compared with other structures when data switching activity is within 20%. CDSAFF consumes the least PDP in all range of different data switching activity. REFERENCES

20

40

60

80

100

Data switching activity (%)
Fig. 8. Power delay product for different data switching activity.

Fig. 9 shows PDP as a function of supply voltage when α is 25%. All flip-flops have the trend of less PDP with lower supply voltage. One exception is ACSAFF. In low supply voltage, the controlling node NC becomes too low to turn on NMOS. Thus, its delay increases much faster than other flip-flops. In high supply voltage, node NC is increased, so the time to wake the inverter chain from sleep is decreased. In addition, higher supply voltage reduces CLK-Q delay. High supply voltage has double impact on D-Q delay of ACSAFF, so the rise of PDP with supply voltage is less steep. However, within worst supply deviation, 1.62 V to 1.98 V, ACSAFF still have good performance. The weakness of utilizing adaptive clocking and conditional capturing is extra controlling transistors, so ACSAFF consumes large power when α is 100%. However, in practical situation, data switching activity is between 0% and 25% [10]. Under the circumstances, ACSAFF is very suitable for low power and energy-efficient systems. In high speed applications, we can use CDSAFF instead.

H. Kawaguchi and T. Sakurai, “A reduced clock–swing flip-flop (RCSFF) for 63% power reduction,” IEEE J. Solid-State Circuits, vol. 33, no. 5, pp. 807-811, May 1998. [2] R. P. Llopis and M. Sachdev, “Low power, testable dual edge triggered flip-flops,” in Proc. Int. Symp. Low Power Electronics and Design, 1996, pp. 341-345. [3] A. Gago, R. Escano, and J. A. Hidalgo, “Reduced implementation of D-type DET flip-flops,” IEEE J. Solid-State Circuits, vol. 28, pp. 400-402, Mar. 1993. [4] W. Chung, T. Lo, and M. Sachdev, “A comparative analysis of low-power low-voltage dual-edge-triggered flip-flop,” IEEE Trans. VLSI Syst., vol. 10, no. 6, pp. 913-918, Dec. 2002. [5] J. Tschanz, S. Narendra, Z. Chen, S. Borkar, M. Sachdev, and V. De, “Comparative delay and energy of single edge-triggered & dual edge-triggered pulsed flip-flops for high-performance microprocessors,” in Proc. Int. Symp. Low Power Electronics and Design, 2001, pp. 147-152. [6] N. Nedovic, M. Aleksic, and V. G. Oklobdzija, “Conditional pre-charge techniques for power-efficient dual-edge clocking,” in Proc. Int. Symp. Low Power Electronics and Design, 2002, pp. 56-59. [7] B. S. Kong, S. S Kim, and Y. H. Jun, “Conditional-capture flip-flop for statistical power reduction,” IEEE J. Solid-State Circuits, vol. 36, pp. 1263-1271, Aug. 2001. [8] B. Nikolić, V. G. Oklobdzija, V. Stojanović, W. Jia, J. K. Chiu, and M. M. Leung, “Improved sense-amplifier-based flip-flop: design and measurements,” IEEE J. Solid-State Circuits , vol. 35, pp. 876-884, Jun. 2000. [9] N. Nedovic, M. Aleksic, and V. G. Oklobdzila, “Timing characterization of dual-edge triggered flip-flops,” in Proc. Int. Conf. Comput. Design, Sep. 2001, pp. 538-541. [10] C. Svensson and J. Yuan, “Latches and flip-flops for low power systems,” in Low Power CMOS Design, A. Chandrakasan and R. Brodersen, Eds. Piscataway, NJ: IEEE Press, 1998, pp. 233–238.

[1]

4332

Sponsor Documents

Or use your account on DocShare.tips

Hide

Forgot your password?

Or register your new account on DocShare.tips

Hide

Lost your password? Please enter your email address. You will receive a link to create a new password.

Back to log-in

Close