Worst Case Noise

Published on January 2017 | Categories: Documents | Downloads: 41 | Comments: 0 | Views: 116
of 18
Download PDF   Embed   Report

Comments

Content

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 38, NO. 5, SEPTEMBER 1992

1494

Worst-Case Power-Constrained Noise for
Binary-Input Channels
Shlomo Shamai (Shitz), Senior Member, IEEE, and Sergio VerdG, Senior Member, IEEE

Abstract-Additive
noise channels with binary-valued inputs
and real-valued outputs are considered. The maximum error
probability and the minimum channel capacity achieved by any
power-constrained
noise distribution
are obtained. A general
framework which applies to a variety of performance measures
shows that the least-favorable noise distribution is, in general, a
mixture of two lattice probability mass functions. This framework holds for m-at-y input constellations on finite-dimensional
lattices.

l\
.
-1

Index Terms-Worst-case
noise, channel capacity, probability
of error, discrete-input channels.
-2 _

I. INTRODUCTION

E ERROR PROBABILITY and the capacity of
binary-input additive-noise channels are well known
-3 _
if the noise is Gaussian. A basic problem in communication theory is to find the worst-case performance achievable by any noise distribution as a function of the signalto-noise ratio. In addition to its applications to channels
subject to jamming (where the least-favorable power-con4
strained noise distribution is of interest), the worst-case
performance provides a baseline of comparison for any
non-Gaussian channel in which the receiver knows the
noise statistics. This paper gives a complete solution to
this problem for the two major performance measures:
dB
0
error probability and capacity.
Consider the binary equiprobable hypothesis testing Fig. 1. Gaussian (lower curve) and worst-case (upper curve) error
probabilities.
problem:
T

H, :

Y= +l+N,

HO:

Y= -1 +N,

(1.1)

shown in Fig. 1 (SNR = l/g2 > 0 dB) the worst-case
error probability admits a particularly simple expression:

where N is a real-valued random variable constrained to
satisfy an average-power lim itation E[N’] 2 a2. In the
P,(cT”) = q.
sequel, we find an explicit expression for the worst-case
error probability attained by the optimum (maximum-likelihood) detector, which knows the noise distribution. The
worst-case error probability is depicted in Fig. 1 along Thus, the worst-case error probability is equal to l/4 at 0
with its Gaussian counterpart Q(l/a>. In the region dB with every decrease in error probability by one order
of magnitude requiring an additional increase of 10 dB in
Manuscript received September 20, 1991; revised January 28, 1992. signal-to-noise ratio. The noise distribution that maxiThis work was supported in part by the Office of Naval Research under
m izes error probability is shown to put all its mass on the
Grant N00014-90-J-1734.
S. Shamai (Shitz) is with the Department of Electrical Engineering, integers (-M;..,
Ml, where M depends on the signal-toTechnion-Israel Institute of Technology, Haifa 32000, Israel.
noise ratio and the weight assigned to each of those
S. Verdi is with the Department of Electrical Engineering, Princeton
integers depends (in addition to the signal-to-noise ratio)
University, Princeton, NJ 08544.
IEEE Log Number 9108219.
only on whether the integer is even or odd. A simple sign
OOlB-9448/92$03.00 0 1992 IEEE

SHAMAI (SHITZ) AND VERDti:

-10

-5

I

WORST-CASE POWER-CONSTRAINED

5

10

15

: SNR,dB,
20

Fig. 2. Gaussian (upper curve) and worst-case (lower curve) channel
capacities.

detector achieves the minimum error probability for that
least-favorable distribution.
In addition, the paper finds the solution to the generalization of the worst-case error probability problem to
vector observations with arbitrary binary signaling.
It is well known [ll, [lo] that the capacity of the additive
memoryless channel
yi = xi + Ni
is minimized, among all power-constrained noise distributions, by i.i.d. Gaussian noise, if the input is only constrained in power, in which case the optimal input is also
iid Gaussian. If the input is binary valued (a ubiquitous
constraint in many digital communication systems, such as
direct-sequence spread spectrum) the worst-case capacity
as well as the least-favorable noise distribution were previously unknown (beyond the fact shown in [3] that the
worst-case noise is distributed on a lattice). The worst-case
capacity of the binary-input channel is depicted in Fig. 2
along with the Gaussian-noise capacity. At low signal-tonoise ratios, both curves coincide as anticipated in [5].
(For low signal-to-noise ratios binary signaling is an
asymptotically optimum power-constrained input strategy.)
The difference between the Gaussian and worst-case capacities is at most 0.118 bit. Compound channel capacity
results (e.g., [331)lend added significance to the worst-case
capacity as it equals the channel capacity when the encoder and decoder know only the signal-to-noise ratio but
not the noise distribution. Moreover, memoryless input
and memoryless noise constitute a saddle-point of the
game between communicator and interferer with mutual
information as the payoff function [22], [30]. Thus, the
value of that game is given by the worst-case capacity over
all iid noise distributions.
The foregoing results are obtained as an application of
a general framework developed in Section II, which applies to many other performance functionals of information-theoretic interest besides error probability and capacity, such as divergence, cutoff rate, random-coding error
exponent, and Chernoff entropy. Those general results
show that the worst-case performance functional is given
by the convex hull of the functional obtained by minimiz-

NOISE FOR BINARY-INPUT

CHANNELS

1495

ing only over power-constrained noise distributions which
place all their mass on a lattice whose span is equal to the
distance between the two inputs. This implies that the
least-favorable distribution is, in general, the mixture of
two lattice probability mass functions. This conclusion can
actually be generalized to m-ary input constellations on
finite-dimensional spaces, as long as the input constellation puts its mass on a lattice. Then, the least-favorable
noise distribution is a mixture of two distributions on
lattices that are shifted versions of the input lattice.
We now review some of the most relevant previous
works from the extensive literature on worst-case noise
distributions. In contrast to our work, the framework of
most of those results is game-theoretic and a saddle-point
solution is sought, providing a guaranteed performance
level for both the communicator and the interferer. For
the power-constrained mutual-information game, i.i.d.
Gaussian input and noise constitute a saddle-point (see
[l], [3], [lo], [21], [22] and the references therein). It was
shown in [3] that if the input is binary equiprobable, the
power-limited noise that minimizes mutual information is
discrete with atoms located on a lattice whose span is
equal to the distance between the inputs. However, the
location of the lattice, the probability masses and the
worst-case mutual information have not been reported,
except for asymptotically low signal-to-noise ratios in
which case [5] verified that the least-favorable noise tends
to a Gaussian structure. If the output is quantized to a
finite number of levels, then the least-favorable powerconstrained noise takes a finite number of values [15],
[29]. Other cases with input and/or output quantization
are considered in [3].
Mutual-information games arise in the context of compound channels where the noise statistics are unknown at
encoder and/or decoder (e.g., [Sl, 1311,[331 and the references therein) and in the context of arbitrarily varying
channels (e.g., [B], [9], [ll], [17] and the reference therein).
The results of [16] on channels with unknown memory
provide further motivation for the study of the worst-case
capacity of the scalar memoryless additive-noise channel.
An error probability game for the binary hypothesis test
in (1.1) has been considered in [24]-[26], [28]. The main
differences between those works and our problem are the
consideration of amplitude constraints instead of power
constraints and the fact that a minimax test is sought
which does not know the noise distribution. As we discuss
in Section III, those differences make both the nature of
the solution and the method of attack profoundly different from ours. The study of the bit error rate of symbolby-symbol demodulation of signals subject to intersymbol
interference leads [13], [14] to the consideration of a
worst-case error probability problem, where the additive
interference consists of the sum of Gaussian noise and an
amplitude- and power-constrained independent noise.
and approach are quite
Again, the solution
different from those of the present paper because of the
additional Gaussian noise, and, more importantly, because for the purposes of [131, [141 attention must be

1496

IEEE TRANSACTIONS

restricted to a sign detector, instead of the maximum-likelihood detector.
The organization of the rest of the paper is as follows.
Section II proves that in binary-input channels the worstcase power-constrained noise (for a wide variety of performance measures) is, in general, the mixture of two lattice
probability mass functions and gives a general characterization of the worst-case performance measure. Those
results are then used in Sections III and IV to solve the
worst-case error probability and capacity problems, respectively. Section III gives closed-form results for the
least-favorable noise distribution and the worst-case error
probability (Fig. 1). The single-sample hypothesis test in
(1.1) is extended to a multisample (vector) problem with
arbitrary binary signalling, allowing arbitrary dependence
among the noise samples. The least-favorable multidimensional distribution has the same marginal as in the singlesample case whereas its memory structure coincides with
that of the Gaussian variate that minimizes the signal-tonoise ratio at the matched-filter output. The worst-case
error probability curve of this general case is the same as
in the single-sample case (Figure 1). Section IV considers
input-output mutual information with binary equiprobable inputs as the performance functional. The leastfavorable probability mas function is found as the solution
of a pair of recursive equations, coupled by their initial
conditions. The worst-case capacity achieved by such a
distribution is depicted in Fig. 2. At low signal-to-noise
ratio, binary output quantization is known to incur in a
power penalty of 2 dB if the noise is Gaussian. We show
that the maximum degradation from Gaussian capacity
due to output quantization and non-Gaussian noise is a
factor of 3 (power loss of 4.8 dB), achieved when the
additive noise is uniform. In both Section 3 and 4 we
study an issue of practical importance in jamming applications: how to robustify the choice of the least-favorable
noise distribution so that exact knowledge of the input
alphabet by the jammer is not necessary in order to
approach worst-case performance. Finally, Section V extends the results of Section II to m-ary input constellations on finite-dimensional lattices.
II. WORST-CASE NOISE DISTRIBUTION
The probability measure defined by the noise random
variable N is denoted by
P,(B)

ON INFORMATION

1992

where d: R+ x Rf -t R, v is an arbitrary reference meaPNpl<v
and pN
sure such that P,,,+ 1 QV, P,,,*v,
denotes the density
PAZ)

= Z(z).

For the performance measure (2.1) to make sense, it is
necessary that it be invariant with respect to the reference
measure, in which case we can write

This condition is equivalent to restricting the kernel d to
be positiuely homogeneous (e.g., [27, p. 281):
4 Px, BY> = Pd(x,y)>
(2.4
for all p > 0, x 2 0, y 2 0. In addition to this property,
our development will only require that the kernel d be
convex. Henceforth, we shall assume that the kernel is
both positively homogeneous and convex.
Some examples of positively homogeneous convex kernels d of interest are the following.
1) If X is equal to + 1 with probability CYand equal to
- 1 with probability 1 - (Y, then the input-output
mutual information is
Z( X; X + N) = Z( PN)
with (Fig. 3)
d(x,y)

= c-wxlog

x

ax + (1 - cx)y
+(l

2) The (Kullback-Leibler)
and PNeI is
wp,+

- CX)ylog

Y

c4!x+ (1 - CX)y’

divergence between PN+ 1

III%1)

= Z(Piv)

with
d(x,y)

=xlog;.

Stein’s lemma (e.g., [21) states that this performance
measure is the rate of exponential decrease of
P[H,IH,] when P[H,IH,]
does not exceed an arbitrarily small positive constant for the i.i.d. test:

= P[N E B],

for any Bore1 subset B of the real line. If the output of
the channel is Y = X + N and X takes values + 1 and - 1
only, the achievable performance depends on the similarity between the measures PN+ 1 and PN- 1. In this section
we consider a wide class of functionals that quantify the
“distance” between PN+ 1 and PNpl, namely those that
can be written as

THEORY, VOL. 38, NO. 5, SEPTEMBER

H,:
Ho:

(Yl, -0. Y,) = (Nl + l,... N, + l),
(Yl, *** Y,) = (Nl - 1, ..a N, - 1).

3) The random-coding exponent functional for the binary input channel is [12]
J% PI = -l%(

-Z(P.v))

with
d(x,y)

= -(ax

l/l+P + (1 - n)y’/l+P)l+~.

The special case p = 1 corresponds to the channel
cutoff rate.

SHAMAI (SHITZ) AND VERDlj:

WORST-CASE POWER-CONSTRAINED

NOISE FOR BINARY-INPUT

CHANNELS

1497

Fig. 3. Kernel for mutual information with equiprobable inputs.
Fig. 4. Kernel for error probability with equiprobable inputs.

4) The error probability of maximum likelihood detection for the equiprobable binary test in (1.1) is

ZJ, = 1 - Z(P,)
with (Fig. 4)
d(x,y)

inf

Phi discrete
M,(P,)Iu~+E

= tm=(x,y).

5) The variational distance, or L, distance, between
P Nfl and ‘N-1 is equal to Z(P,) with

d(x,y) = Ix -yl.
The L, distance is actually an affine transformation
of the distance measure in Example 4:
Z(P,)

= 2 - 4P,.

6) The Chernoff entropy (e.g., [4]) or Hellinger transform (e.g., [6]) is equal to -Z(P,> with
d(x,y)

‘(‘I+)

M,(P,)S

I(‘,)

(2.3)

a2

with M,(P) = lx2 dP(x).
First, we present an auxiliary result which shows that
we may restrict our search of least-favorable noise distributions to discrete distributions.
Theorem 1:

inf

PN discrete
M,(P,k
v2

Z(P,).

5 w(c2).

(2.5)

(a

(2.6)

‘(‘N)

and whose po\?reris as close as desired to that of N.
To define N, let us partition the real line in intervals
Zi = ((i - 1)/M, i/M] for an arbitray positive integer M.
Then, the discrete random variable N is
if N E Zi.

(2.7)

Note that ff + 1 and i$ - 1 take values on the same
lattice. It follows from (2.1) that we can write
Icp.$) =
=

f d(Pdzi-M),
i= -cc
fi d( l$x
i= -cc
/ 4PNcX

s i-,,

W(02) =

s

&-L,

The objective is to minimize the function Z(P,) under a
second-moment constraint on the noise distribution:
5;

z(p,)

To that end, for every N we will find a discrete i
quantized version of N) such that

= -x1-*y”,

where 0 < (Y < 1. In the special case (Y = l/2,
-log(-Z(P,))
is the Bhattacharyya distance which
is directly related to the Hellinger distance, whose
square is Z(P,> with d(x, y> = l/2(&
- &>“.

W(a2) p

right-hand side of (2.4). By the continuity of convex functions on open intervals, it suffices to show that for all
E > 0,

P.dzi+hf))

- 1) dv(x),

+ 1) dv(x)

1

x - l)~p,(~
i.d(pN(
I

+ 1)) dv(x)

(2.4)

Proofi The convexity of d implies the convexity of Z
which in turn implies the convexity of W and of the

= z(p,),

(2.8)

where the inequality follows by convexity of d. The excess
power of the quantized version of N can be easily

IEEE TRANSACTIONS

1498

-x2]

dpN(x)

I

THEORY, VOL. 38, NO. 5, SEPTEMBER

1992

Even though the assumptions of kernel convexity and
positive homogeneity are not sufficient, as we shall see, to
claim that the worst-case noise takes values on a span-2
lattice, it is indeed true that W(cr 2> follows immediately
from V(cr 2> as the next result demonstrates.

bounded:

E[g2]- EW21
= ij&

ON INFORMATION

Theorem 2:

5 icl $4L2’

-

‘1 dpN(x)

where convf denotes the convex hull of the function f
(the greatest convex function upper bounded by f).

I

s-.

20
M2

Now, fix an arbitrary E > 0 and select M2 > 2c/e. For
such a choice of A4 and E, we have
{PN: E[N2]

I a2} c {PN: E[G2]

I a2 + e}.

(2.15)

= conv V(a2),

W(a’)

(2.10)

Prooj? Every discrete distribution on the real line can
be put as a (possibly countably infinite) mixture (i.e.,
convex combination) of distributions on span-2 lattices
IL,, E E [O,2)). In other words, for every discrete PN, we
can find a countable set {Ed, e2, *a*} c [O,2) such that

Thus,

(2.16)
inf

s

‘(‘N)

P,,, discrete
M2(PNko2+~

inf
3:
M,(P,$)<

Z(Pti)

inf

5

where

fT2+ E

aj =

Z(P$)

PN-:
M,(P,)
I c2
<

(2.11)

W(a2),

Having shown that it is sufficient to restrict attention to
discrete noise distributions, we proceed to consider a
specific class of discrete distributions, namely those whose
atoms lie on a span-2 lattice L, = {E + 2k}z= --m, E E
[O,2). Notice that if N E L,, then X + N E LcE+ Ijmod2 =
i.e., the output belongs also to a span-2 lattice
-$-l)mod2~
because the separation between the inputs is equal to the
span of the noise lattice.
Define the subset of doubly infinite nonnegative unitmass sequences E = (p: pi 2 0, CT=-mpi = 11, and the
functional
(2.12)

ii d(p,,pi+,)
i= --m

with p E E. Let Q,(p) be the probability mass function
on the lattice L, that assigns mass pk to the point 2k + E.
Were we to restrict attention to span-2 lattice distributions, the resulting optimization problem would be
V(a2) k

inf

EE[0,2)

inf
PEE
m
C (E + 2i)‘pi

PEE
var (p) 2 (T */4

pij’ = ;t-P,((2k

+ ej}).

(2.18)

I
Conversely, every discrete probability measure cll = (LY~,
(Ye,0.. ), E = (Ed, e2, **a) and (p(l), pc2),*** > defines a probability mass function through (2.16). Let us consider the
second moment and objective function achieved by such a
discrete distribution:
M2(pN)

=

e

(2.19)

olipj

j=l

with
co
p.J =

c

kc -co

(2k + •)~p(j)
I
k

(2.20)

and if ( el, l 2, **. ) are all distinct,
Z(pN) = 5
j=l

pN({Ej

= f

Z<QA P))

j=l

5 d(P,({ej
k= --m

+ 2kl),

f 2k + 2)))
2 d( ajp!j’,
kc -co

ajp&)

5 uz

(=-cc

inf

(2.17)

Lej)

and

where the inequalities follow from the fact that P$ is
discrete, (2.10), and (2.8), respectively.
•I

J(P) =

pN(

J(P),

(2.13)

where the last equation easily follows from the invariance
of the objective function Z to the location of the lattice:

~(Q,(P)) = J(P)-

(2.14)

= E ajJ(p(j)),

(2.21)

j=l

where the last equation follows from the positively homogeneous property and (2.12). The linearity properties in
equations (2.19) and (2.21) would be enough to conclude
the proof if it were not for the fact that (Ed, e2, *.. ) are
assumed to be distinct. From Theorem 4, (2.16), (2.19),

SHAMAI(SHITZ)AND

VERIH~:~~R~T-~ASE

POWER-CONSTRAINED

(2.20), and (2.20, we obtain

W(2)

=
m

(2.26)

5 ajJ( p”‘)
j=l

where

m

caj
,=I

inf
? (])EE
‘.,=I
...

ql,.

c
k=

1499

veniently as a two-step minimization:

inf inf
cx

NOISE FOR BINARY-INPUTCHANNELS

(2k + Ej)pp)

5 CT2

v,(oJ) =

-cc

inf

J(P).

POE

(2.27)

m

2 inf
cx

i;f

C aj

,c

inf
EjE[0,2)

j=l

CXTp=iT2

J( p”‘)

inf
&DEE
,g

m

(Ej + 2k)‘PP

5 Pj

(2.22)

where the inequality follows by dropping the restriction
that all ej be distinct, and the last two equations follow
from the definition of I’ and of its convex hull respectively.
To show the converse inequality to (2.22), we will use
the fact that according to the definition of conv V( u 2), for
every y > 0 and a2 > 0, there exist distributions P, and
P2 on respective span-2 lattices such that
d42(P~)

+ (1 - cu)M,(P,)

I u2

(2.23)

and
y+ convV(g+)

2 aI

+ (1 - a)Z(P,)

2 Z(aP,

+ (1 - a)P2),

(2.24)

+ E)2 5 a2

The optimization in (2.27) is a convex program which
can be solved using the constrained theory of global
optimization (e.g., [19]). Once the family of functions {V,,
0 5 E I 11 is obtained, W(a 2, can be computed via (2.15)
and (2.26), or simply by taking the convex hull of (V,,
0 2 E I 1) because the convex hull of the pointwise infimum of an arbitrary collection of functions is the convex
hull of the collection [27, p. 351. The analysis of the
worst-case noise problem for the specific performance
measures of error probability (Section III) and capacity
(Section IV) will provide an illustration of the derivation
of I’(a2) via (2.13) and via (2.26), respectively.
Theorem 2 shows that the least-favorable noise distribution is, in general, a mixture of two probability mass
functions on two span-2 lattices whose respective locations may depend on the noise variance. It is possible to
further characterize the worst-case payoff function and
the least-favorable noise in those problems where the
kernel is not only positively homogeneous and convex, but
it is smooth enough for the convex optimization problem
in (2.27) to be solvable via the Kuhn-Tucker theorem.
Theorem 3: Suppose that J is Gateaux differentiable
with convex Gateaux differentials. Then,

/V(02)T

for some 0 I (Y I 1. Therefore, for every y > 0 and a2 >
0, there exists a discrete distribution P, = UP, + (1 a>P2 such that
M,(P,)I

P,(2k
m

W(u2)
=I if a2 2 1,
ifO<a2<1.

min{V(u2),V(0)

u2

+ a2(V(l)

- V(O))],

\

(2.28)

and
y + conv V(a2) 2 Z(P,).

Proof: Every value of W( (T 2> = conv V( u 2> can be
written as the convex combination of two points:

Hence, we must conclude that
convI/(a2)

2 W(a2).

convV(a2)

(2.25)
q

The key step in the proof of Theorem 2 is the linearity
property (2.20, which holds as a consequence of the
positive homogeneity of the kernel. From an intuitive
viewpoint it says that regardless of the distribution of N,
the observation Y = X + N inherently carries the side-information of the identity of the span-2 lattice to which the
actual value of the noise belongs. This follows because the
input itself takes values on a span-2 lattice.
Theorem 2 reduces the optimization in (2.4) to the
conceptually simpler one in (2.13). A potential difficulty
for the direct solution of (2.13) is the nonconvexity of its
feasible set. However, (2.13) can be expressed more con-

= aV,,Cu,“)

+ (1-

~)V,,(C,“)

(2.29)

with
u2 = au,2 + (1 - a)u,2.

(2.30)

If u2 is such that it suffices to take or = e2 and

2 = u22 = u2 in (2.29) and (2.30), then V(u2> =
conv V( u 2). Assume, on the contrary, that we can choose
u2 > 0 such that V(u2> > conv V(u2>. Then find e1 Z e2
(+l

and u:<u2<u;

which satisfy (2.29) and (2.30). Let

p* E E and q* E E be such that they attain the infimum
in (2.27) for (Ed, u:) and (e2, ~2) respectively. The infimum in (2.27) is indeed attained because of the compactness of its feasible set [18] and the continuity of the
convex payoff function.

IEEE TRANSACTIONS

1500

Define the minimum payoff achievable with a noise
distribution which puts its mass on the union of the two
lattices LE1 U LEZ:
T(P)

inf
a Pa4

=

J(P) + J(q)-

c
Pk + 4k = 1
k= -I/
2k + E,)‘JQ
,z

+ (2k + e2)‘qk

I p

u(

ON INFORMATION

THEORY, VOL. 38, NO. 5, SEPTEMBER

1992

tion, thereby showing
convl/(fl*)

= V(a’)

in that region. If g* < 1, then the above argument (applied to both p* and q*) also leads to a contradiction
(which implies that conv V(a *) = V( CT*>) unless p* has
only one nonzero atom, and Q1(q*) has only two nonzero
atoms, at - 1 and 1. This is possible only if a: = 0 and
0
c2 2 = 1. Hence, the second part of (2.28) follows.

(2.31)

When the sufficient conditions of Theorem 3 are satisfied and the kernel is such that the distribution that
According to (2.29) and the definition of p* and q*, the achieves the minimum in (2.13) must have more than two
infimum in (2.31) for p = g2 is attained by the argument nonzero atoms regardless of a* (this is the case for
((up*, (1 - a)q*). Moreover, (p*, 0) and (0, q*) achieve capacity (Section IV)) then it readily follows from the
the infimum in (2.31) for p = CT: and p = CT:, respec- foregoing proof that W((T*) = V( (r*) and the leasttively. This is because if, say, the feasible point (p’, q’) favorable noise has a single-lattice probability mass funcwere to achieve lower payoff than (p*,O), then (ap’, aq’ tion.
+ (1 - a)q*) would achieve lower payoff than (a~*, (1 It is shown in Appendix B that if the kernel is symmetcw)q*).
ric and twice differentiable then the value E = 0 (lattice at
Now using the assumption in the theorem, 119,p. 2491, ( . . . -2, 0, +2, **a>>is locally optimal in the minimizait follows from the convexity of the optimization in (2.33) tion’of (2.13). The value E = 1 (lattice at ( v-e, -3, -1,
that we can find Kuhn-Tucker coefficients h,(p) and + 1, + 3, e.0)) is also locally optimum provided that CT2 1
h,(p) such that for every i with @ > 0, the following and that
equations hold:
cS2d(o, w)

Ji(q*)

+ A&?)

+ A,(o-*)(2i

+

l 2)*

= 0 (2.32)

6x 6y

-c 0,

for 0 < w 5 1.

and
J,(q*)

+ A,( CT,‘) + A,( a,2)(2i

+ E*)* = 0, (2.33)

where Ji denotes the partial derivative of .7 with respect
to its ith argument and we used the positive homogeneity
of J in order to substitute
Ji((l - a)q*)

= Ji(q*).

(2.34)

Since U* < CJ,“,we must have h,(o*) # h,(gz) [19, p.
2221(cf. similar argument in [3]) which is compatible with
(2.32) and (2.33) only if q* has either just one atom or two
atoms at points 2j + e2 and 21 + l 2 such that (2j + l 2)*
= (21 + E*)*. This can occur only if 2j + l 2 = -(2Z +
c2) = k E {l, 2, **. }. B ecause of the homogeneity and convexity of d, we can rule out each of those cases except the
one corresponding to k = 1 (e2 = 1, j = 0, 1 = - 1). For
example, if k = 2, then l 2 =O, j=l,l=
-l,andthe
payoff is
J(q*) = 40, cC1) + d(q*,,O)
2 4% Cl)

+ W,qT)

+ d(q+l, d) + d(qT,O)

+ d(qT,O)
(2.35)

because of convexity and homogeneity. The right side of
(2.35) is the payoff of the noise distribution that puts mass
q?, at - 1 and qT at 1 and 0 elsewhere. That distribution
has strictly lower variance and not higher payoff than the
original q* thereby contradicting the optimal&y of q*.
If the optimum distribution places nonzero mass only at
- 1 and 1, then its second moment is equal to 1. Therefore, if (T* 2 1, then ~2’> 1 and we arrive at a contradic-

III. PROBABILITY OF ERROR
The results of Section II are applied here to the kernel
d(x,y)

= k m={x,y).

(3.1)

The outcome of this analysis will be the characterization
of the power-constrained noise distribution that maximizes the error probability of a single-sample maximumlikelihood test between the equiprobable hypothesis (Example 4)
H,:
Y= +l +iv,
Ho: Y= -1 +N.
As we saw in Example 5, such least-favorable noise distribution minimizes the variational distance between P,,,, 1
and PNpl.
In addition to its practical relevance, this problem serves
as a good illustration of a case where I’(/(a*), the minimum payoff achieved by a single span-2 lattice, is not a
convex function in any interval of the positive real line.
This shows that neither the convex hull in the statement
of Theorem 2 nor the smoothness condition in Theorem 3
(violated by (3.1)) are superfluous, as one might have been
tempted to conjecture at first glance. Furthermore, it
demonstrates that our general framework can lead to
explicit solutions even if the payoff function is such that
conventional functional optimization tools (such as the
Kuhn-Tucker theorem) are not applicable in the solution
of (2.27).

SHAMAI

(SHITZ) AND VERDTj: WORST-CASE POWER-CONSTRAINED

Theorem 4: For k = 1,2;**, let
ak

2 A $2

- 1).

(3.2)

NOISE FOR BINARY-INPUT

Pi-l

P,(d)

= f -

3

u2 - uk”
+ ?i k(k + 1)(2k + ,l) ’

2Jm

(3.3)

Proof The following result gives the worst-case probability of error achievable when the noise takes values on
a single span-2 lattice.
Lemma 1:

Pi+1 = ‘**

>

(3.9)

O

= Pi+r

(3.10)

> Pi+r+l*

The existence of 1 is guaranteed if (3.9) is satisfied because the sequence {pi, i 2 O] is summable and monotone
nonincreasing. Choose 0 < 6 < min{pi+l -~~+~-~,p~-~
- pi} and let
pi’ =pi

+ 8,

PI*I = Pi+1 - ‘*

1k2--

r-is?

1 -J(P)

= y=+

m~{PI~PI+J

(3.4)

mmIPl+,,

where k E {1,2, .*a} and /3 E [0, l] are such that
a2 = ;(k

+ P)(k

+ 2 - 3/3)

(3.5)

(gz I g2 I gk2,1) and U c E is the subset of distributions p such that for some integers IZ < m

0 <p,-I

< var(p)

P-6)

5 J(P)*

(3.7)

and
J(P’)

Fix p E E - 6. Since both the function J and the variance are shift-invariant, we can assume, without loss of
generality, that p is centered so that its mean satisfies
C~~m_,ipi

I

P!+l+lI

=

+

max{Pi+l,

6,

Pi+r+lI

-

‘3

ifi<n-lori>m,

Pi = O,

Proof of Lemma 1: Let fi be the closure of U under
the operation of taking mirror images (swapping pi and
pei>. For every p E E - U, we will find p’ E E, such that
Pf + P,

var(p’)

=mm{PiyPi+d

whereas all other pairwise maxima remain unchanged
and, thus, (3.7) is satisfied (with equality).
Finally, if p E E - U is such that no i 2 0 satisfies
either (3.8) or (3.9), then repeat the procedure with’the
mirror image of p and the sought-after i 2 0 will then be
found unless p is of the form

ifi<nori>m,
Pj = O,
Pn = *** =pm-I ZP,.

<p,

= *** =pm-l

2p,

> 0,

in which case it can be put as a convex combination of an
element p’ E U and its mirror image. Then it is immediate to check that var(p’) = var(p) and J(p’) = J(p).
This concludes the proof of the first identity in (3.4).
To show the second identity, we need to obtain J(p)
and var (p) for p E U. The general form of p E U is
[.*.O,p;..,p,l
- kp,O;**] with k E {1,2;..] and p E
[l/(k + l), l/k]. Instead of using k and p to parametrize
p it will be convenient to use k and /? E [O, 11 defined as
p = kp(k + 1) - k.

Now, p achieves

3.

Select, if possible, i 2 0 such that

J(P)

Then, choose 0 < 6 < (pi+ r - pi)/2,
p; =pi
Pf+l =Pi+l

and let

i,

+ 6,
-

= 6(2Cjpj

var(p)

(3.11)

+ 1)

+ (1 - kp)(k

+ 1)’

i=l
2

m=W1,pIl

5 m~iPj-l,Pjl

+ 6,

mdPI,PI+lI

=

-

p ;i
I

+ (1 - kp)(k

i=l

6,

= ;(k
max{Pi+l~Pi+*I~

2k(k

=p ii*

-

s

P+k

and

- 2i - 1) - s2

mmIPi7Pi+,J

m~{Pi7Pi+lI

+p]

1
=-+
2

‘7

I -s*,
which means that (3.6) is satisfied. Besides,

max{PI+17P!+21

f

1=--m

= $1

while keeping p,! = pj for all other j. Then,
- var(p)

=

(3.8)

Pi < Pi+l.

var(p’)

>Pi+l

and let I E {l, 2, ..* } be such that

var (p) 2 a2/4

+<

>Pi

Again, (3.6) is satisfied and

1 - V((T2) =

-

1501

whereas all other pairwise maxima remain unchanged,
and, thus, (3.7) is also satisfied.
If no i r 0 satisfies (3.8), select, if possible, i 2 0 with

The worst-case probability of error is
1

CHANNELS

+ P)(k

+ 2 - 3p),

+ 1)

1
(3.12)

1502

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 38, NO. 5, SEPTEMBER 1992

whereupon the second inequality in (3.4) is proved concluding the proof of Lemma 1.
0
Proof of Theorem 4 (continued): Using the result of
Lemma 1, it is easily shown that V((T*> is strictly concave
on {u,“, uk”, r1 (see Fig. 5). Thus, V((r’) is not convex, and
conv V(cr *> = 1 - P&u *) is obtained as the convex hull
of the points {V(u:), k = 1,2, a*.}. Furthermore,
= 1 - V(C$)

P,(q!)

(3.13)

because P,(O) = 0 and the slopes of the segments uniting
successivepoints are strictly decreasing:

P,(4+1> -CM>
uk: 1 - &-

2

0

1

=-

02

(3.14)

4
02

6

8

Fig. 5. Worst-case error probability as 1-conv V((s2>.

4ii*
i=l

Thus it follows that for u2 = auf + (1 - c~)az+,, CYE
(0, 0,
Pe(u2) = e?(4)
+ (1 - 4P,<4+1>
and the theorem is proved.

0

Note that
1
--~
2

1
2&P

-6

I P,( 2)

1

I 2 -

2&E-i

1

’ (3.15)

where the lower bound is the error probability achieved
when N is a zero-mean uniform random variable and the
upper bound is a strictly concave function which coincides
with P,(u*> at u* = ut, k = 1,2;*..
We conclude from Theorem 4 that a single span-2
lattice achieves the maximum probability of error only
when the allowed noise power is equal to ut = i(k* - l),
k = 1,2, -0. . Those worst-case distributions are symmetric
and distribute their mass evenly on k atoms. (Those
atoms are located at 0, + 2, + 4, *** if k is odd and at
*1, * 3,*** if k is even.>When the allowed noise power
lies strictly between ul < u * < uk”, r, then a single span-2
lattice is no longer least favorable. Instead, the worst-case
distribution is the unique span-l lattice that is a m ixture
of the span-2 lattices that are least-favorable for u; and
uk”, r with respective weights (u 2 - uk”, r)/( 02 - ui+ r)
and (u,” - ~*>/(a~ - u;+r) (Fig. 6). In particular, if
SNR > 0 dB (u2 < 0, then the worst-case noise is symmetric with nonzero atoms at - l,O, + 1, i.e., the channel
becomes a symmetric erasure channel. Note that for low
signal-to-noise ratios, the worst-case noise cdf does not
become asymptotically Gaussian, as m ight have been surm ised from capacity considerations. In fact, for high u*,
the Gaussian and worst-case error probabilities behave as
1
--2
and

respectively.

1
&Go-

-5

-4

-3

-2

-1

0

1

2

3

4

5

6

Fig. 6. Least-favorable (error probability) noise distribution for (T’ = 7
(SNR = -8.45 dB).

The nature of the least-favorable noise distribution
implies that a sign decision (decide HI if Y > 0, decide
H, if Y < 0 and arbitrary if Y = 0) is a maximum-likelihood rule (yielding the m inimum probability of error). It
should be noted that the pair (sign detector, worst-case
noise) achieves the maximin error probability solution, but
not the m inimax solution. For example, if u > 1, the
noise can make the error probability of the sign detector
reach l/2 by concentrating all its mass in atoms u and
- u in arbitrary proportion. Therefore, the game between
the detection strategy and the power-constrained noise
distribution has no saddle-point. This is the reason why
the problem considered here is not a special case of (and
results in a radically different
solution from)
the problem considered by [13] and [14] which seek the
amplitude and variance constrained noise N that maxim izes P[ IN + GI > 11, the error probability of a sign
detector in the presence of a background Gaussian noise
G.
In the summary published in [35], Y. I. Zhodzishskiy
announces a solution for a worst-case noise, error probability problem, under seemingly the same setting considered in this section. Unfortunately, no details of the
solution are given in [35].
The equally likely symmetric span-2 lattices on k points
that achieve the maximum error probability for uz remain optimal if instead of a power constraint E[i’V*] I at,
the noise distribution is chosen to satisfy an amplitude
constraint -A I N I A, where k - 1 I A < k. In this
case, however, the solution is not always unique as replacing the discrete distributions by the “picket-fence” contin-

SHAMAI (SHITZ) AND VERDfJ: WORST-CASE POWER-CONSTRAINED

NOISE FOR BINARY-INPUT

uous distributions of [24] achieves the same error probability.
As we remarked in the introduction, one of the main
applications of the results in this paper is to communication in the presence of jamming. The nature of the
solution found in this section may lead one to believe that
both the worst-case probability curve and the least-favorable noise distribution are crucially dependent on the exact
knowledge (from the jammer’s viewpoint) of the input
alphabet. Consider the case where the input values are
1 + 6, and - 1 + 6,. If the jammer knew S, and S,, then
its best power-constrained strategy is to use the leastfavorable discrete distribution found in this section on a
span 1 + (6, - So)/2 lattice in lieu of the integers. This
would lead to a worst-case error probability curve given by
P,((2g/(2 + S, - S,))2). Thus, small perturbations of the
input values lead to small deviations in the worst-case
probability. But what if the jammer does not know the
values of 6, and S,? If, for example, it assumes that
6, = 6, = 0, but in fact 6, # S,, then it is easy to see that
the optimum detector for the least-favorable distribution
found in this section achieves zero error probability. Since
in practice small uncertainties are inevitable, the usefulness of our results for jamming problems would be seriously limited, unless we can show that the least-favorable
distribution can be robustified so as to avoid the extreme
sensitivity exhibited above. To see that this is indeed the
case, we are going to obtain a lower bound on the worstcase error probability under the assumption that the noise
distribution is chosen without knowledge of 6, and 6,.
That bound is obtained by selecting a specific noise distribution and an optimum detector with side information.
Choose an arbitrary 0 < o0 < cr and let the noise be

out error, and the error probability is

N=N*

+N,,

where N * and N, are independent, N * is the worst-case
noise distribution with power o2 - ~0’ found in this
section when the input is ( + 1, - 11 and N, is zero-mean
Gaussian with variance ~0”. The side-information is such
that in addition to (or in lieu of) Y = X + N the receiver
has access to Y, and Yb which are distributed according to
H,:

Y, = 1 + N*,
Yb = 6, + No;

Ho:

y,=

-l+N*,

Yb = 6, + N,,.

Note that Y = Y, + Yb. The nature of N* is responsible
for an interesting property of the observations Y,: either
the value of Y, is such that it determines the true hypothesis uniquely (this happens if N* takes its maximum [resp.
minimum] odd or even value and the true hypothesis is
H, [resp. Ho]> or it is useless for the hypothesis test. Note
that the latter case occurs with probability 2P,((+’ - at).
Now the optimum decoding strategy is clear: if the value
of Y, is useless, decide H, iff lYb - S, 1 < lYb - S,I, otherwise the value of Y, determines the true hypothesis with-

CHANNELS

The conclusion is that, ?e(a2), the worst-case error probability when the noise distribution is chosen without
knowing the values of 6, and S, is bounded by

I’,(u2 - a;) s &a2)
I Pe((2u/(2

+ 6, - s,,)>2).

This implies that if 6, - S, -+ 0, then p&02) * P,(a2),
because at can be chosen arbitrarily close to 0. Thus,
small deviations from the assumed input alphabet lead to
small deviations in the worst-case error probability. Furthermore, we have shown how to robustify the leastfavorable noise distributions against uncertainties in the
input values by convolving it with a low-variance Gaussian
distribution. The Gaussian shape was chosen for convenience in expressing the lower bound; in practice, the
variance and shape of the smoothing distribution can be
fine-tuned, if desired, as a function of the specific structure of the uncertainty in the input alphabet. In most
cases, background noise coexists with the jamming noise.
If the power of the background noise is comparatively
small, it is not necessary that the jammer take the foregoing robustification measures against uncertainties in the
input alphabet. If the background noise power is comparable or larger than the jammer’s, then the optimization
of the jammer’s distribution falls outside the scope of this
paper.
An important generalization of the model considered in
this section is the case of multisample equiprobable hypothesis:
H,:
Ho:

Y=X,
+N,
Y = X, + N,

where all the quantities are n-dimensional vectors: Xi, X,
are deterministic and N is a random vector satisfying
&N,12]
n

I u2.

(3.16)

If N is zero-mean Gaussian with covariance matrix u21,
then it is well known that the probability of error is
Q(m)
with
SNR =

II-q - X,l12
4a2

-

(3.17)

If N is restricted to be an i.i.d. random vector, then
finding the worst-case (maximizing the error probability of
the minimum error probability detector) marginal distribution under a power constraint (3.16) remains an interesting open problem. However, for asymptotically large it,
the worst-case distribution can be obtained by maximizing
the Chernoff entropy (Example 6) which can be carried
out following similar lines to those reported in Section IV.
If the restriction that the noise be independent is dropped,
and a completely general dependency structure is allowed

IEEE TRANSACTIONS

1504

together with the power constraint in (3.16), then the
problem can be solved using the result of Theorem 4.
Theorem 5: The maximum probability of error of the
maximum-likelihood test between
H,:

Y = Xl + N,

H,,:

Y=X,,

ON INFORMATION

Proofi Adding a deterministic vector -(Xi + X,)/2
to the observations makes the problem equivalent to an
antipodal one with X, = A = -X0. So, for convenience
we will assume that special case. First, let us show that
P,(n/SNR) is achieved by the following noise distribution:

( h%)2

min
mha”ZG‘i
tr Hsna2

H,:

Y = (+l

+ q)A,

HO:

Y=

(-1

+ v)A,

which is equivalent to the single-sample problem considered in Theorem 4 (any yi/Ai such that Ai # 0 is a
sufficient statistic) and therefore the probability of error
achieved by N* is P,(na2/11Al12> as we wanted to show.
Conversely, no other noise distribution can achieve
worse probability of error. To verify this, consider the
class of detectors that perform a maximum likelihood
decision on the scalar statistic 2 = ATY. Since this is not
necessarily a sufficient statistic (when the noise is not
white and Gaussian), the error probability of such a
detector is an upper bound to the minimum error probability for every noise distribution. The two hypothesis
become
H,:
H,,:

2 = (IAll

+ ATN,

2 = -llA1/2 + ATN,

which is equivalent to the { + 1, - 11 model of the present
section with noise power equal to

*

The probability of error achieved by the least-favorable
Gaussian distribution is Q(dw>,
therefore the comparison between the worst-case and Gaussian curves is
exactly as in the single-sample case (Fig. 1).
Iv.

CAPACITY

This section is devoted to the solution of the worst-case
capacity of the binary-input memoryless channel
yi =Xi + Ni,

N* = VA,

where n is the scalar noise (on a span-l lattice) that
maximizes the error probability in Theorem 4 when the
allowed noise variance is equal to ncr2/11A(12. With this
noise distribution, the observations become

1992

straightforward to show (cf. [34], 132, Proposition 71) that
among all covariance matrices C such that tr(z) I nc2,
X* = nc+2AAT/llA112 is the one that minimizes the
matched-filter signal-to-noise ratio

+ N,

over all n-dimensional random vectors N satisfying
E[llNl121 I ncr2 is equal to P,(n/SNR) where P, and
SNR are given in (3.3) and (3.17), respectively.

THEORY, VOL. 38, NO. 5, SEPTEMBER

(4.1)

where Xi takes values on { - 1,ll. Thus, we seek
rn$

C(u2) =

mpZ(X;

X + N),

(4.2)

E[N21S (+*

where the maximum ranges over all distributions on
{ - 1,l). Results on compound channels [33] indicate that
C( (T2>remains the capacity of the channel even when the
decoder is not informed of the noise statistics. If the
encoder does not know the noise statistics when choosing
the codebook, then the capacity of the channel is potentially lower:
max
X

rn$

I(X;X+N)

(4.3)

E[iV21<CT*

However, in the present case
rnt
mpl(X;
E[iPl~ (T2

X + N)
= max
X

rnfi 1(X; X + N)
E[i+ls u*

(4.4)

as a result of the concavity-convexity of mutual information in the respective arguments. In fact (X*, N*) is a
saddle-point for the game in (4.4), where X* is the equally
likely distribution on { - 1, l}, and N* is the symmetric
distribution that achieves’
C(8)

=

m$

I(X*;X*

+ N*).

(4.5)

E[N211 CT.2

a quantity which is upper bounded by nu2/11A114 because
of (3.16) and the Cauchy-Schwarz inequality. Consequently, the error probability of the suboptimal detector
cannot be made larger than P,(na 2/11All ) regardless of
the choice of N.
0
It is interesting to note that the least-favorable noise
distribution that achieves the worst-case error probability
in Theorem 5 has the same memory structure as the
worst-case Gaussian noise vector. To see this, it is

Note that even though the minimum in (4.5) is aption’ not
necessarily achieved by a unique distribution, one of the
distributions that achieves it must be symmetric, as an
equal mixture of any noise distribution with its mirror
image can only decrease the mutual information. To conclude the verification that (X*, N*) is a saddle-point, note
1 As mentioned in Section I (see also, e.g., [23]) the basic additivity
bounds of mutual information [8, p. 561 imply that i.i.d. input and noise
with marginal distributions X* and iV* are a saddle-point to a game
where both input and output are allowed to use memory.

SHAMAI (SHITZ) AND VERDti:

WORST-CASE POWER-CONSTRAINED

that by symmetry the optimum distribution for the symmetric noise distribution N * must be X*.
The conclusion is that (4.3) is equal to (4.5) which fits
the framework of Section II with the special case of
Example 1:
Y
d(x,y) = ;1og 7x x
. (4.6)
+ ; log 7
y+;

2+2

-

~‘Pj>P1,P2,*.*
j

i

+ E)~ + ( 1 - Cp.
i, +2

+ A’ C’pj(2j
[ i

- U2]>

(4.7)

where Cj stands for sum from ---coto 00except 0. Substituting h = 8A’ and taking the partial derivative of (4.7)
with respect to pk, k f 0, we obtain the Kuhn-Tucker
condition:

-,,,(I

+ 2)

+ h(k2 + ke) 2 0

(4.8)

with equality if pk > 0. Immediately we recognize that for
all Kuhn-Tucker conditions to be satisfied simultaneously, it is necessary that pk > 0 for all k =
*-a -l,O,l;**.
Since (4.8) depends only on consecutive
ratios and we want to decouple the doubly infinite set of
conditions into a pair of semi-infinite sets, it is convenient
to introduce the variables
r

k

Pk

=-

Pk-I

Sk = -

k = 1,2;..,


P-k

P-k+1

k = 1,2, ... .


’ The symmetrized version of N* (equal mixture of its mirror image
and itself) achieves the same mutual information.

CHANNELS

Then, the Kuhn-Tucker
sions

1505

conditions become the recur-

log(1 + rk+i) = h(k2 + ke) + log(1 + si)
, (4.9a)
lOg(l

A partial characterization of N* was obtained in [3]
where it was shown that N* puts its mass in a (possibly
noncentered) span-2 lattice. 2 This is equivalent to the
convexity of the function V(a 2>, and follows from Theorem 3 in the interval [l, a) because the smoothness conditions therein are satisfied by mutual information. However, knowing that I/ is convex does not abridge its
computation as the convex problem in (2.27) must still be
solved for each E E TO,11. Then the pointwise minimum,
or equivalently, the convex hull of that family of functions
is readily obtained.
In this case, (2.27) can be solved using the Kuhn-Tucker
theorem. However, its direct application leads to a doubly
infinite set of equations whose solution is not immediate.
To overcome this difficulty, we will transform the problem
into one with two independent semi-infinite sets of equations which are coupled by their initial conditions only.
Taking into account that the sum of the masses must
equal unity, we can eliminate one variable, say p,,, and
one Lagrange multiplier to yield the Lagrangian
J “‘P-2~P-1~1
i

NOISE FOR BINARY-INPUT

+ sk+l)

= h(k2 - ke) + lOg(l

+ S1)

+ log (1 + rl) - log
for k = 1,2, a.. . In Appendix A, it is shown that for every
h > 0, 0 I E 5 1, there exists one and only one initial
condition (r,, si) resulting in positive bounded sequences
{r&=l,{s&,l
that satisfy (4.9). Searching for the initial
condition (pi, si) which corresponds to that solution, is
numerically straightforward. Once the valid solution is
obtained, then the solution p is recovered by

I

k

Po,r_fl

Pk

=

k

PO

I

k 2 1,

5’

ks

l-Is-j,

j=l

-1,

where p,, is chosen so that the total mass equals unity.
The result of the numerical solution in the range of
SNR’s of usual interest is depicted in Fig. 7 for E = 0 and
E = l/2. The behavior of V,(a 2, for E = l/2 in Fig. 7 is
typical of curves obtained for E f 0. It is indistinguishable
from V,(g 2, at low SNR’s, and as the SNR increases it
rises above V,(a 2, until it reaches unit capacity at the
minimum allowed u2 = e2. The conclusion for all the
SNR’s considered is that the centered span-2 lattice
minimizes capacity. A conclusion sup1 *-* -2,0,2,***}
ported (for all SNR) by the result of Appendix B showing
the local optimality of zero offset under sufficient conditions which are satisfied by the kernel considered in (4.6).
The least-favorable noise distribution is shown in Fig. 8
for SNR = 0 dB and SNR = - 10 dB. For low SNR’s the
least-favorable cdf approaches a Gaussian shape (in
agreement with [5]), whereas in the region SNR > 0 dB
the least-favorable distribution is indistinguishable from a
three-mass distribution with weights (a 2/S, 1 u 2/4, u2/S) at ( - 2,0,2). This three-mass noise distribution achieves capacity equal to
log2-

(I-

$)log($

- 1)
+(1-

;)log(;

-2)

(4.10)

in the interval 0 < u2 < 4. The minimum of (4.10) and
the capacity of the Gaussian noise channel is indistinguishable from the worst-case capacity (Fig. 9). The minimum of (4.10) and i log(1 + uW2) is also a good approximation to the worst-case capacity, because binary inputs
achieve the input-power constrained capacity at low sig-

IEEE TRANSACTIONS

I
-10

5

-5

10

: SNRIIIBI
20

15

Fig. 7. Worst-case capacity achievable with span-2 lattices centered at
E, for E = l/2 (upper) and E = 0 (lower).

0.75

-10

-8

-4

-6

T

-2

0

2

4

6

810

(a)
0.25

.

T

.

.

.

.

v

-10

-8

-6

-4

-2

0

2

4

I

t

.

6

8

10

(b)

Fig. 8. Least-favorable (capacity) noise distributions for (a) SNR = 0
db and (b) - 10 dB.

nal-to-noise ratios. The nonmonotonicity of (4.10) (Fig. 9)
is a consequence of the fact that the three-mass distribution is forced to satisfy the variance constraint with equality. As g2 + 4 (and the mass at 0 vanishes) the channel
becomes noiseless. Needless to say, the least-favorable
distribution always satisfies the variance constraint with
equality.
In the high SNR region the worst-case distributions for
error probability and capacity differ in that in the former
case, the masses are placed at (- l,O, 1) whereas in the
latter, they are essentially distributed on c-2,0,2). This
implies that in the error probability problem the channel

ON INFORMATION

THEORY, VOL. 38, NO. 5, SEPTEMBER

1992

I
-10

-5

5

10

15

20

Fig. 9. Gaussian-noise capacity (dashed) and capacity achieved with
three-mass noise distribution (solid).

becomes an erasure channel, whereas in the capacity
problem we obtain a binary symmetric channel to which
“noiseless” outputs at -3 and 3 are appended. In the
single-sample hypothesis testing problem, zero outputs
(erasures) carry no useful information, in contrast to the
setting of encoded communication, in which mistaking - 1
for + 1 and viceversa is much more harmful than getting a
zero output.
It is of interest to quantify the discrepancy between the
worst-case capacity and the well-known capacity curve of
the binary input Gaussian noise channel (e.g., [12, prob.
4.221 and [2, p. 2741). The maximum difference between
Gaussian capacity and worst-case capacity (Fig. 2) is 0.118
bit and occurs at 7.2 dB, whereas the maximum relative
decrease is 12.5% occuring at 6.7 dB. The relative power
loss of worst-case capacity with respect to Gaussian capacity can be seen from Fig. 2 to grow unbounded with the
signal-to-noise ratio. However, in most applications this is
not an important comparison, because in that range, a
minimal increase in capacity requires a large increase in
signal-to-noise ratio.
It is well-known (e.g., 120, p. 1031)that for asymptotically low signal-to-noise ratio, the capacity of the inputpower constrained Gaussian channel is reduced by a factor of 7r/2 (which translates into a power loss of about 2
dB) when the output is quantized to two levels (regardless
of whether the power-limited input is further constrained
to take two values only). In this context, both input and
output quantization are assumed to be optimal, which
corresponds to a zero threshold at the channel output and
antipodal equiprobable inputs if the noise is Gaussian. It
is interesting to see how this degradation factor from
Gaussian capacity changes not only when the output is
quantized but when the noise is not Gaussian. Of course,
the answer depends on the noise distribution, but our
previous results allow us to show that (at low signal-tonoise ratio) the degradation factor is at most 3 (or a
power loss of 4.8 dB). This bound is achieved, among
other noise distributions, by the uniform density. To prove
this, we use the result of Appendix C to write the ratio of
the worst-case capacity of the input/output quantized

SHAMAI~HITZ) AND VERDI: WORST-CASE
POWER-CONSTRAINED
NOISEFORBINARY-INPUTCHANNELS
channel to the Gaussian capacity as
log2 - h(P,( f9))
7
; log 1 + $
i

(4.11)

t( u2) 2 C( a2/var (X))

h(P) =Pl%(l/P)
+ (1 -P)l%(l/(l
-P>>*
The limit of (4.11) as u --) COis equal to l/3 as can be
verified by recourse to (3.15), which shows that the bound
is not only attained by the least-favorable noise distribution but also by the uniform density. For such a noise
distribution, we cannot do better by allowing nonbinary
inputs because the input-output mutual information is
(X + W)

- lrn h(&(
--m

bound on e(a2) is obtained by allowing the choice of N
to be made with full knowledge of the distribution of X,
in which case we obtain

i

where the signal-to-noise ratio is me2 and we have denoted

h(w

1507

-x>> @x(x),

2 C(&(l

- a,)2).

(4.12)

In order to derive an upper bound to &a 2> we will
choose a specific noise distribution (cf. Section III) which
does not use the values taken by X:
N=N*

+N,,

where N* is the noise distribution that achieves (4.5) with
power a2 - u:, and N,, is zero-mean Gaussian with
variance cr$, independent of N*, X*, and A. Consider
the following inequalities,
t(u2)

sZ(X;X+

N)

I Z(X*, A;X*
+ N*, A + N,,)
which is maximized by antipodal equiprobable input when
I Z(X*; X* + N*) -I- Z( A; A + N,,)
N is symmetric. It should be emphasized that capacity
2
1
degrades asymptotically by at most a factor of 3 when the
1+%
I C(d
- a,:)+$og
7 (4.13)
binary output quantization is optimal. If the output quanutl 1
i
tization is forced to be a zero-threshold, then the worstcase (binary-input) capacity [3] is equal to 0 if u > 1 and where the second and third inequalities follow from the
log 2 - h( u 2/2> if (+ I 1. The latter expression should be data-processing lemma and the independence of N* and
compared to the worst-case capacity with optimal output No, respectively. Since the choice of 0 < u,, < a; is arbiquantization: log 2 - h(a 2/4> if g I 1. Thus, optimal trary we conclude from (4.12) and (4.13) that C( u 2> -+
output quantization buys 3 dB over straight zero- C(u2) as a, -+ 0. In other words, exact knowledge of the
thresholding if the signal-to-noise ratio exceeds 0 dB. binary input alphabet it is not necessary for the jammer to
Below 0 dB, reliable communication is possible only with approach worst-case capacity. The same observations we
optimal quantization. It can be shown (from (4.10)) that made in Section III regarding the robustification of the
for high signal-to-noise ratio, the worst-case binary-input least-favorable noise are applicable here. The more gencapacity with unquantized outputs (Fig. 2) behaves asymp- eral (and practically relevant) problem where there are
totically as log 2 - h(a 2/8>, which means that optimal constraints on the time-dependence of the input alphabet
binary output quantization costs 3 dB at high signal-to- (such as assuming that it remains constant) are captured
by a channel with memory where the sequence Ai is no
noise ratios, (and, thus, zero-thresholding costs 6 dB).
Note that the foregoing question gives the worst-case longer i.i.d. The foregoing conclusions can be shown to
capacity in the presence of output quantization. This is remain intact by considering mutual informations of ndifferent from finding the noise distribution which is most blocks of random variables.
sensitive to output quantization (i.e., that maximizes the
V. m-Aw INPUTS
relative decrease in capacity due to quantization). That
For
the
sake
of
clarity and concreteness Section II
serves as an example of a performance functional does
focused
on
the
case
of a binary antipodal channel where
not belong to the class studied in Section II.
the
input
X
is
either
+ 1 or - 1. All the results therein
Analogously to the error probability problem, we now
can
be
easily
translated
to the general binary case where
show that the worst-case capacity does not break down
X
takes
two
arbitrary
distinct
values {A,, A,}. This is
when the noise distribution is chosen without exact knowlbecause
shifting
and
scaling
of
the
observations only afedge of the input alphabet. Let
fects the reference measure in the functional in (2.1),
Y=X+N,
which is itself invariant to the reference measure because
X=X*
+ A,
of the positive homogeneity property of the kernel. Physiwhere X* is equiprobable on { - 1, + l}, and A is allowed cally, this simply says that meaningful performance functo be dependent on X* so that X is a binary equiprobable tionals are invariant to scaling and shifting of the channel
random variable. Without loss of generality we may as- output. Essentially, the only change is the replacement of
sume E[ Al = 0. The minimum Z(X; X + N) over all N span-2 lattices by span-IA, - A,1 lattices.
independent of all other random variables and chosen
More significantly, the general framework of Section II
without knowledge of the distribution of A such that holds not only for binary but for m-ary channels. We can
E[N2] I u2 is denoted by C(u2>. An immediate lower even allow the input-output space to be an n-dimensional

IEEE TRANSACTIONS

1508

Euclidean space. The key restriction is that the input
distribution places all its mass on (a finite subset of) a
lattice-a
very common situation in practice. The kernel
has now m arguments, the noise power is now measured
by /R~llzl12dp,(z), and the span-2 lattices L, = (E +
2/c};= em are now substituted by lattices

ON INFORMATION

THEORY, VOL. 38, NO. 5, SEPTEMBER

1992

different ({r,& i, s,) and ((r$= 1, s;) are positive and hounded
(the reasoning is the same if the difference is in the sequence
{s,J). Assume that
(1 + r,)(l

+ SJ 2 (1 + ri)(l

+ s;>

(A.1)

and let Ak = r, - rk. Then (4.9a) results in

E + k,u, + -0. +k,u,,
where {vi E R”}J?C
i is the basis of the input lattice;
k,
are
integers;
and E E R” belongs to the fundakl,...,
mental parallelotope of the lattice: (e.g., [71)
%Ul

+ .a* + t&u,,

(0 5 ei < 1).

The function v,(a2> becomes the worst-case payoff
achieved by power-u2 lattice distributions centered at E
and whose basis is equal to that of the input lattice. As in
Section II, V( u2) is defined by further minimizing with
respect to E. It is then straightforward to check that both
Theorem 1 and Theorem 2 apply verbatim.
Although not a key result from the viewpoint of computing worst-case performance. Theorem 3 is an auxiliary
result which contributes to the characterization of the
worst-case noise distribution in problems with smooth
kernels. In particular, it shows that above a certain variance, ut, the worst-case noise takes values on the input
lattice (subject to a possible shift). In the case of the
{ - 1, l} input constellation we found3 uO = 1 for smooth
kernels. It is possible to generalize this result (under the
same smoothness constraints) to show that a, is equal to
one half of the maximum distance between any two points
of the input constellation. To verify this, recall that according to the proof of Theorem 3, ut is such that a
power-u2 two-mass noise distribution N with masses at
Ni and N, with llN1ll = IIN can only be least favorable if
u I u,,. But since the least-favorable distribution must
have zero mean, N1 = -N, and u = IINilI. Moreover,
there must exist two input points Xi and Xi such that
Xi + Nl =Xj + N,,
for otherwise the convexity and homogeneity of the kernel
imply that N cannot be least-favorable (cf. proof of Theorem 3; physically, the input can always be obtained from
the output unless this condition is satisfied). Thus, Ni =
<Xj - Xi)/2, and we can take uO = i max,,j 11X,- X,11.

A k+l =

*Cl

k

+ rd(l

+ sd - $$I-

(1 + r;)(l
;

+ s;)
I

. exp ( hk2 + GE)

, A (1 + rd(l + SI>
-

k(l + rk)(l

+ r;)

exp ( Ak2 + hke).

G4.3)

Now, using the assumption that both {rk] and {rL) are bounded
and positive, it follows from (A.3) that Ak + 00,which contradicts the assumption.
For every choice of (rl, si) E R+' we can have three different
behaviors for each of the sequences (rk) and {sk]: a) it takes
negative values, b) it stays positive and bounded, and c) it grows
without bound. The positive quadrant can be partitioned into
four regions, whose interiors are characterized respectively by
the behaviors (a, a), (a,~), (c, a) and (c,c) for irk] and {Sk],
respectively. Note that those sets are indeed open because rk
and sk are continuous functions of (rl, s,). At the boundary
between two regions where either behavior switches, the corresponding behavior is of type b (e.g., the boundary line between
the regions (a, a) and (a,~) is of type (a, b)). Since the region
around the origin is of type (a, a) and sufficiently large pairs
belong to (c, c), there must exist at least one point which is in the
boundary of either (a, a) and (c, c) or (a, c) and (c, a). Therefore,
there must exist an initial condition leading to (b, b) behavior.
And, as we saw, the initial condition is in fact unique.
APPENDIX

B

In this appendix, we show that for symmetric and twice
differentiable kernels, a noise distribution that places the mass
points on a centered even-integer (..* -2,0,2, ...) lattice is
locally optimal. If also (~?~dxy/~?xdyXo, w) < 0 for 0 < 0 5 1,
then the centered odd-integer ( ..* -3, - 1, 1,3, ... > lattice (if it
is not precluded by the variance constraint) is also locally
optimal. This result supports the numerical observation in Section III that the average mutual information is globally minimized by a symmetric noise distribution having its support on
the even integers.
We assume throughout this appendix that d is symmetric

ACKNOWLEDGMENT

Appendix A benefited from discussions with N. Vvedenskaya of the Russian Academy of Sciences. Discussions
with A. Lapidoth of the Technion are also acknowledged.
APPENDIX

A

We show here that there exists a unique initial condition
(I~, s,) for the recursion (4.9) such that both {r&= i and {skg= i
are positive and bounded. To show uniqueness, suppose that two

4x9 Y> = d(Y, x>

and that the partial derivatives d,, d,, d,,, d,,, and d,, exist.
Let pk = P [N = 2k + E] be the probability that the random
variable N taking on the values 2k + E, where k is an integer
and4 E E (- 1, 11. Let p stand for the vector {pk}. We are
interested in the values of p and E that achieve the minimum of
J(P) =

3 Note that CT,,is an upper bound to minimum variance for which the
least favorable distribution takes values on a single lattice. For example,
in the case of capacity, the minimum variance is 0, whereas for error
probability, it is +m.

(‘3.1)

it dhPk+d
k= --m

(‘3.2)

4 In Appendix B, it is more convenient to use the interval (- 1, + 11in
lieu of the [0,2) interval used throughout the paper.

SHAMAI (SHITZ) AND VERDI?: WORST-CASE POWER-CONSTRAINED

NOISE FOR BINARY-INPUT

under the constraints

lem is equivalent to (B.lO)-(B.13) where Pk = 0 (also here
(B.12) is satisfied with equality due to the monotonic decrease of
min J(or*,O) with 02). In this case, the optimizing (Y*, satisfies
ak * = pTk = pi 2 0 for k = 0, 1,2;.., A4 and ok = 0 for k 2 M.
If ‘Ye > 0 for all k, we interpret the results as the limit of
A4 + 00.Note further that if ‘Y$ > 0 for k < L and (Ye= 0 for
k = L, then (Y$ = 0 for k 2 L as otherwise replacing ai = 0
with a;+,,, > 0 (where m r 1) and letting (Y;+~ = 0 satisfies
the constraint and does not increase the target function. By the
Lagrange multipliers technique, the optimal values of (Y* should
satisfy

pk 2 0,
,=$

Pk

03.3)

=

cc

l )*pk

5
(2k +
k= --m

03.4)

l,

I a2.

(B.5)

We saw in (2.13) that this optimization problem is equivalent to
minimizing (B.2) under the constraints (B.3), (B.4), and (B.6),
e

k”p,

-

k= --m

i

kp,

i k= --m

’ 5 a2/4,

and the optimal E is chosen so that the mean of N is zero, that
is,
E= -2

2

kpk,

ak

=

f(Pk

+.P-k),

(B.8a)

Pk

=

;(Pk

-P-k),

(B.8b)

or equivalently
(B.9a)

Pk=ffk+Pk>
=

ak

-

(B.9b)

Pk.

The optimization problem in terms of { ak}, { Pk} is the minimization of
J(a,P)

=

5
k=O

d(ak

+

Pk,

ak+l

+

+

d(ak+l

da,

{ di+jf;;;j

ffk

-

Pk)

i
-

l,

-

dL(a*, P*)

(B.1’3

(B.ll)

aI& 2 0,

(B.13)

0,

(ijFpk)2

-

a;

1,

03.17)

k = 0, l;..,

M - 1,

(B.18)

k = 0, l;..,

M - 1,

(B.19)

and the Hessian matrix

i

;T

“c

1

(B.20)

= V2L( a!*, p*)

should be positive definite where A, B, and C are M X M
matrices defined by

(B.14)
Before undertaking this optimization, we focus on a related
optimization where E = 0, that is, we preassume an even
(centered) lattice. The corresponding optimization (B.2)-(B.5)
happens to be a standard convex program with a compact space
{p} and therefore a global optimum is guaranteed. Further it is
seen that in this case pi =pT, where the asterisk denotes the
optimizing probabilities. This is because, by symmetry, if {pk} is
a solution so is {pek} and due to convexity, using the distribution {1/2p, + 1/2p-,} will yield a decreased value of the
target function, unless pk = pek. Thus, the optimization prob-

i

=

(B.12)

k=O

(B.16)

where A, and A, are nonnegative Lagrange multipliers.
Now we return to the original optimization problem
(B.lO)-(B.14) and relax the constraint (B.14). We will show that
the pair (a*, p* = 0) is a local minimum and since constraint
(B.14) is also satisfied, this point is also a local minimum of the
original optimization problem. Due to the Lagrange theorem,
the pair (01*, p* = 0) conjectured to give a local minimum
should satisfy

Q’
2 5 k2ak -

k=l

,,j2,,,,

‘@k

k =

O’}

(Y,+2Cak-l

L(a,p)=J(a,p)-hl

under the constraints
‘Ycf2Ca,I,

(B.15)

is positive definite. We obtain the Lagrangian

dL(a*, P*)
o
&Yk
= )
fik+l,

M - 1,

and the Hessian M x M matrix

Pk+d

-

1509

k = 0, l;..,

0,

=

(B.7)

k= --m

where the indices of pk are selected so that E E (- 1, 11. It was
also argued that min J(p) is a monotonic decreasing function of
m2 and therefore constraint (BS), or equivalently (B.6), is
satisfied with an equality sign.
We now show that E = 0 is a local minimum. To that end, we
define

p-k

dL(a*,O)

03.6)

I

CHANNELS

A =

aij =
i

B =

bij =

di+jqcw*, p*)
dcYiffj

(B.21a)
1

dLi+j(,*,p*)

*, *
= ““;;,;,’

da’@

i
c =

Cij =

1

)

dL’+‘( lx*, p*)
dp i ap j

and where i, j = 0, 1,2;.., M - 1.

(B.21b)

(B.21~)
I

1510

IEEE TRANSACTIONS

ON INFORMATION

THEORY, VOL. 38, NO. 5, SEPTEMBER

1992

The partial derivatives are given by
d,(‘%

+

Pk>

ak+l

+

&+I)

+

d,(%,

+dx(cYk - &; ak-1 - Pk-1)

Wa,P)
da,

=

-24

dx(cx,, + PO, a1 + PI) + dy(q
dx(ak

+

-d,(cY,
JL(a,P)

Pk,

ak+l

+

&+I)

=

dx(ao

+ PO, (~1 + PI)

Using the symmetry (B.l), which implies d&c, y) = d&y, x), and
noting that 01* satisfies (B.13) implies that
da,

dL(a*3

=

13* = O) = 0 f
@k

Vk = 0, l;.., M.

(B.24)

Turning now to the Hessian matrix Q, it follows after some
straightforward calculations that B = 0 and C = A + D, where
the elements of D are dij =, 8A,ij.
Since A is positive definite (B.16), so is D, it follows that Q is
also positive definite completing the proof.
We turn now to the case of the odd-integer lattice. It is
convenient to rewrite J(p) in (B.2) as
J(P) = dh,po)

ak

Pktl,

+

bk)

ak

-

Pk)

(B.22)

- PI, (~0 - PO) - Aa,,k
d,(@h-,

+

+ d,(%+l

Pk-1,

-

ak

Pk+lr

+

= 0

Pk)

ak

-

Pk)

(B.23)

k = 1,2;.., M - 1

@k

aL(cY*, p* = 0)

+

- &, (Yk-1 - P&l)

Pk-1,

-

M - 1

k = 1,2;..,

- 2k2A,,

+

+ d,(‘%+l

+ it d(Pk-,,Pk)

-

d,(vPl+o-Po),k=O

j

otherwise no distribution on the odd integers can satisfy the
power constraint. We proceed similarly to the previous case. Set
E = 1, it is easy to show that, since we deal with standard convex
programming, a global minimum exists and due to symmetry and
convexity pk = pmck+ i), k = 0, 1,2 .** , (that is, & = 0). Therefore, the problem is equivalent to that formulated by
(B.28)-(B.31) with Pk = 0 for all k. Letting (Y* stand for the
optimizing vector (with M nonzero components where M + 00 if
(Ye2 0 for all k), it follows, by repeating the same steps as in the
previous proof, that the point (Y*, p* = 0 satisfies

A!&*,

p* = 0) = %(a*,

p* = 0) = 0,
k = 0, l;..,

M - 1,

(B.33)

where now

+ d(P-(k+l),P-k).

k=l

(B.25)

L(a,P) =J(a,P)

-A,
(

2

i
k=,ffk-1)-A2

Let CY~and Pk be given by
Qk

=

(Pk

+

P-(k+1))i2,

(B.26a)

Pk

=

(Pk

-

P-(k+l))12.

(B.26b)

or equivalently
Pk

=

ak

+

=

ak

P-(k+l)

(B.27a)

Pka
-

(B.27b)

Pk,

The optimization problem (B.2)-(B.5) in terms of 01 and P
(~.26) is
J(a, P) = 4ao - PO, aa + PO)
+ 5 d(ak-1
k=l

+ Pk-1,

+

ak-l

d(%

-

Pk>

-

ak

+

Pk)

Pk-1)

(B.28)

under the constraints
2kijoak

l/2

;
k=O

(2k + l)a,

-

=

f

(B.29)

l,

(2k + l)pk

2 = &,

(B.30)

k=O

(Yk > 0,

-ffk<&<(Yk,

(B.31)
(B.32)

The equality in (B.30) follows again due to the monotonic
decrease of min J(p) with u 2. We assume now that f12 > 1, for

(2k + l)ak -

2 - cr;
(B.34)
)
I
and where A,, A, are nonnegative Lagrange multipliers. To show
that 01*, p* = 0 is indeed a local minimum, we turn to examine
the Hessian matrix Q = V2L(cw*, p* = 0) which can be expressed by (B.20).
Also here B = 0 and C = A + D, however in the present case
the components of D are given by
i=j=O
2A2
- J&,(4,
q3,
. (B.35)
dii = 2A2(2i + 1)(2j + l),
otherwise
. ;

5

[

k=O

2 (2k + l)pk

k=O

A sufficient condition for D to be positive definite, implying
therefore that Q is also positive definite (A is positive definite
by the fact that (Y* achieves the local minimum for E = l), is
d,,( w, o) < 0 for all 0 < w I 1. This condition is satisfied, for
example, in the special cases of mutual information and error
exponent, for which d(x, y) was specified in Section II.
APPENDIX

c

We find here the worst-case capacity of the channel in (4.1)
when the decoder is constrained to use a binary-quantized
version of each output yi. Denote by C(N, Q) the capacity
achieved with an arbitrary binary quantization rule Q and noise
distribution N. Since we allow the decoder to optimally quantize
each output using the knowledge of the noise distribution, the
objective is to find
min
mmC(N, Q>.
N: E[N*]S (T* Q

sHAMA (SHITZ) AND VERDI?: WORST-CASE POWER-CONSTRAINED NOISE FOR BINARY-INPUT CHANNELS
Let qN be the minimum probability of error of the equiprobable test (1.1) for an arbitrary noise distribution N, and denote by
R(N) the set of decision rules that achieve q,,,. Furthermore, let
N* be the noise that maximizes qN under a power constraint of
c2 (found in Section III), i.e.,
qN* = Pf?(a2>
(C.1)
and let Q* E R(N*) be such that the useless channel output
values are mapped to + 1 and - 1 with probability i respectively. It follows from the concavity of the binary entropy function that the capacity of a binary-input binary-output channel
with crossover probabilities p and q satisfies
c~log2-((1-P)/2+q/2)h(l_~+q)

-(Cl

-q)/T+p/W(

1 -;+,)

.
Then, the following relations hold:
C(N*,Q*)

= log2 - h(P,(cr2))
=

min
N: E[N2]<uZ

I

min

log2 - h(q,)
ma

N:E[N21~u2Q~R(N)

I

min
N: E[Nz]s

m=C(N,
v2 Q

C(N,Q)
Q>

s $axc(N*’ ‘)
= C(N*, Q*>,
cc.31
where the first inequality follows from (C.2) and the last equality
follows because N* effectively turns (4.1) into a symmetric
erasure channel, the optimum binary output quantization of
which is easily seen to map the erasure symbol to each output
with probability i.
We conclude from (C.3) that the sought-after quantity is
min
N:E[N2]r~*

maxC(N,Q)

= log2 - h(Pe(a2)).

Q

This is indeed intuitively plausible: the binary output quantization that maximizes capacity is one of the decision rules that
minimizes error probability.
REFERENCES
[II N. M. Blachman, “Communication as a Game,” IRE Weston Rec.,
vol. 2, pp. 61-66, 1957.
El R. E. Blahut, Principles and Practice of Information Theory. Reading, MA: Addison-Wesley, 1987.
” ”
[31 J. M. Borden, D. M. Mason, and R. J. McEliece, “Some information theoretic saddlepoints,” Siam J. Contr. Opimiz., vol. 23, pp.
129-143, Jan. 1985.
[41 J. A. Bucklew, Large Deviation Techniques in Decision, Simulation
and Estimation.
New York: Wiley Interscience, 1990.
El C. R. Cahn, “Worst interference for coherent binary channel,”
ZEEE Trans. Inform. Theory, vol. 17, pp. 209-210, Mar. 1971.
[61 L. Le Cam, Asymptotic Methods in Statistical Decision Theory.
New York: Springer, 1986.
[71 J. H. Conway and N. J. A. Sloane, Sphere Packings, Lattices and
Groups. New York: Springer Verlag, 1988.
L31 I. Csiszar and J. Korner, Information Theory: Coding Theorems for
Discrete Memoryless Systems. New York: Academic, 1981.

1511

191 I. Csiszar and P. Narayan, “Capacity and decoding rules of arbitrarily varying channels,” IEEE Trans. Inform. Theory, vol. 35, pp.
752-764, July 1989.

R. L. Dobrushin, “Optimum information transmission through a
channel with unknown parameters,” Radiotech Electron., vol. 4, pp.
1951-1956, 1959.
1111 T. Ericson, “The arbitrary varying channel and the jamming problem,” Acta Electron Sinica, vol. 14, no. 4, pp. 21-35, July 1986.
WI R. G. Gallager, Information Theory and Reliable Communication.
New York: Wiley, 1968.
[I31 F. E. Glave, “An upper bound on the probability of error due to
intersymbol interference for correlated digital signals,” IEEE Trans.
Inform. Theory, vol. IT-18, pp. 356-363, May 1972.
t141 E. Hansler, “An upper bound of error probability in data transmission systems,” Nachrichtentech. Z., vol. 12, pp. 625-627, 1970.
D51 M. V. Hedge, W. E. Stark, and D. Teneketzis, “On the capacity of
channels with unknown interference,” IEEE Trans. Inform. Theory,
vol. 35, pp. 770-783, July 1989.
Ml B. Hughes and P. Narayan, “Interleaving and channels with unknown~memory,” in Proc. 19th Conf. inform. Sci. Syst., Johns
Hopkins Univ., Baltimore, MD, Mar. 1985, pp. 722-725.
“Gaussian arbitrarily varying channels,” IEEE Trans. Inform.
[171 ~
The&, vol. IT-33, pp. 267-284, Mar. 1987.
M. Loeve. Probability Theorv. New York: &ringer. 1977.
D. G. Luenberger, ‘Optimization by Vector Spat: Methods. New
t;i;
York: Wiley, 1969.
DO1 R. J. McEliece, The Theory of Information and Coding: A Mathematical Framework for Communication.
Reading, MA: AddisonWesley, 1977.
1211 R. J. McEliece and W. E. Stark, “‘An information-theoretic study
of communication in the presence of jamming,” Z’roc. IEEE Znt.
Conf. Commun., 1981, pp.45.3.1-45.3.5.
t221 R. J. McEliece, “Communication in the uresence of iamming-An
information theoretic approach,” in Secure Digitai Communications, G. Lungo, Ed. New York: Springer-Verlag, 1983, pp.

m

127-166.

Dl

R. J. McEliece and W. E. Stark, “Channels with block interference,” IEEE Trans. Inform. Theory, vol. IT-30, pp. 44-54, Jan.
1984.

1241 J. L. Morris, “On single-sample robust detection of known signals

with additive unknown-mean-amplitude-bounded random interference,” IEEE Trans. Inform. Theory, vol. IT-26, pp. 199-205, Mar.
1980.
“On single-sample robust detection of known signals with
1251 ~
addiiive unknown-mean amplitude-bounded random interference.
Part II: The randomized decision rule solution,” IEEE Trans.
Inform. Theory, vol. IT-27, pp. 132-136, Jan. 1981.
LW J. L. Morris and N. E. Dennis, “A random-threshold decision rule
for known signals with additive amplitude-bounded nonstationary
random interference,” IEEE Trans. Commun., vol. 38, pp. 160-164,
Feb. 1990.
1271 R. T. Rockafellar, ConvexAnalysis. Princeton, NJ: Princeton Univ.
Press, 1970.
[281 W. L. Root, “Communications through unspecified additive noise,”
Inform. Contr., vol. 4, pp. 15-29, 1961.
t291 W. E. Stark, D. Teneketzis, and S. K. Park, “Worst-case analysis of
partial-band interference, ” in Proc. 1986 Conf. Inform. Sci. Syst.,
Princeton Univ., Princeton, NJ, 1986, pp. 379-383.
[301 W. E. Stark and R. J. McEliece, “On the capacity of channels with
block memory,” IEEE Trans. Inform. Theory, vol. 34, pp. 322-324,
Mar. 1988.
1311 I. G. Stiglitz, “Coding for a class of unknown channels,” ZEEE
Trans. Inform. Theory,

[=I

vol. 12, pp. 2899195,Apr. 1966.

S. Verdu and H. V. Poor, “Minimax robust discrete-time matched
filters,” IEEE Trans. Commun., vol. COM-31, pp. 208-215, Feb.

1983.
1331 J. Wolfowitz,

Coding Theorems of Information Theory, third ed.
New York: Springer, 1978.
[341 L. H. Zetterberg, “Signal detection under noise interference in a
game situation,” IRE Trans. Znform. Theory, vol. IT-8, pp. 47-52,
Sept. 1962.
[351 Y. I. Zhodzishskiy, “Maximum guaranteed noise immunity of
bit-by-bit reception with limiting on interference mean power,”
Radiotekhnika, no. 10, pp. 56-57, 1986.

Sponsor Documents

Or use your account on DocShare.tips

Hide

Forgot your password?

Or register your new account on DocShare.tips

Hide

Lost your password? Please enter your email address. You will receive a link to create a new password.

Back to log-in

Close