Image Fusion Technical Report

Published on January 2017 | Categories: Documents | Downloads: 66 | Comments: 0 | Views: 550

of 10

Content

Combining Images through Fusion October 99
Eli Shechtman
- 1 -
TAU / Electrical Engineering
Introduction
The development of new imaging sensors has brought with it a need for image processing techniques
that can effectively fuse images from different sensors into a single composite for interpretation. To
date fusion has been considered primarily as a means for presenting images to humans. For example,
IR and visible images may be fused as an aid to pilots landing in poor weather, or CT and NMR images
may be fused as an aid to medical diagnosis. It may be expected that fusion can become equally
important in combining images, and hence in compressing source image data, for interpretation by
computer vision systems. An image fusion technique is successful to the extent that it creates a
composite that retains all useful information from the source images, and does not introduce artifacts
that could interfere with interpretation.
The most direct approach to fusion is to sum and average the source images. Unfortunately this can
produce unsatisfactory results. Features that appear in one source image but not in others are rendered
in the composite at reduced contrast or superimposed on features from other images, as in a
photographic double exposure.
Some of the most promising approaches to fusion are those that perform image combination in a
pyramid transform domain: An image pyramid is first constructed for each source image, then a
pyramid is formed for the composite image by selecting coefficients from the source image pyramids.
Finally, the composite image is recovered through an inverse pyramid transform.
Several variations on pyramid-base fusion have been described in the literature (years 1984-1992).
These methods generally appear to provide good results. One limitation is in the fusion of patterns that
have roughly equal salience but opposite contrast. This is a pathological case since image averaging
results in pattern cancellation, and selection is unstable.
In this paper I present an extension to the pyramid approach to fusion in which the result is more stable
and noise immunized, and the pathological case of patterns with opposite contrast is reduced (as shown
in Figure 1).
Problem Statement
A general view of image fusion is suggested in the following Figure:

A set of 2 or more source images, I1, I2, …, is obtained of a given scene viewed with different sensors
or under different imaging conditions. Camera focus or exposure may be varied from image to image,
or the positions of light sources may be changed. The source images may be obtained as objects move
to reveal different portions of the background. In any case, each of the source images represents a
partial view of the scene, and contains both “valid” and “invalid” data for the task at hand. The
objective of image fusion is to combine the source images to form one composite image, F, (or multiple
composite images, F1, F2, …) in such a way that valid data are retained from all source images, while
invalid data are discarded. The combined image should also appear “natural” so that it can be readily
interpreted by humans or machines using normal vision capabilities.
One of the basic assumptions of the fusion algorithm is that the source images are aligned / in
registration. This can be achieved by obtaining all source images from the same sensor and optics. It
can also be achieved by electronically transforming (warping) or by applying registration algorithm on
images taken from nearby positions. Those transform algorithms are beyond the scope of this paper and
in all my examples I used aligned images.
Combining Images through Fusion October 99
Eli Shechtman
- 2 -
TAU / Electrical Engineering
3 Pyramid-Base Fusion algorithm
The pyramid implements a “pattern selective” approach to image fusion, so that the composite image is
constructed not a pixel at a time, but a feature at a time. Salient features are identified in each source
image, the copied, intact, to composite image. Less salient features that may partially mask the more
salient features are discarded. In this way features included in the composite are rendered at full
contrast, and double exposure artifacts are avoided. According to this view, the pyramid transform
decomposes each source image into a set of component patterns, the basis functions of the transform.
Pyramid basis functions are compact, self-similar, and of many scales (i.e., wavelets).
To simplify notation I’ll show the case in which there are just two source images, A and B, and a single
composite image, C, although the methods described can be extended to larger numbers of source and
composite images. A general framework for pyramid based image fusion is shown in following Figure:

Source
Image
A
Source Pyramid
D
A
Source
Image
B
Source Pyramid
D
B
M
AB
S
A
S
B
Select
Composite Pyramid

D
C
Composite
Image
C
Image fusion is carried out in four steps:
Step (1) - Construct a pyramid transform for each source image.
Step (2) - Compute match (MAB) and saliency (SA, SB) measures for the source images at each pyramid
sample position.
Step (3) - Combine source pyramids to form a pyramid for the composite image.
Step (4) - Recover the composite image through an inverse pyramid transform.
3.1 Pyramid transform ( steps (1) )
3.1.1 General definitions
The fusion process begins with the construction of image pyramids DA and DB for the two source
images. Let DI(m,n,k,l) be the pyramid transform of image I(i,j). The indices (i,j) indicate sample
position in original image. The indices (m,n,k,l) indicate sample position, level, and orientation in the
pyramid. To simplify notation let m

=(m,n,k,l) indicate a sample location in the pyramid.
In this paper I use for fusion the gradient pyramid transform.
Combining Images through Fusion October 99
Eli Shechtman
- 3 -
TAU / Electrical Engineering
3.1.2 The Gradient Pyramid Transform
A gradient pyramid for image I can be obtained by applying a gradient operator to each level of its
Gaussian pyramid representation. The image can be completely represented by a set of four such
gradient pyramids, one each for derivatives in horizontal, vertical, and two diagonal directions. Here
I’ll first review definitions for Gaussian and Laplacian pyramid transforms, and then the additional
steps for the gradient pyramid transform.
3.1.3 Standard Gaussian and Laplacian Pyramids
Let GK be the k
th
level of the Gaussian pyramid for image I. Then ) , ( ) , (
0
j i I j i G ≡ and for k>0,
2 1
] * [
↓ −
·
k k
G w G (1)
Here w is the generating kernel (pre-sampling filter), and the notation
n ↓
[...] indicates sub-sampling
by n (Decimator).
This pyramid decomposition is illustrated in Fig. (1a).
Let
k
L
~
be the k
th
level of the RE (reduce-expand) Laplacian pyramid. This is defined as the difference
between successive levels of the Gaussian pyramid:
2 1
] [ * 4
~
↑ +
− ·
k k k
G w G L (2)
Here
n ↑
[...] indicates up-sampling by n (“zero padding” n-1 rows and columns between data), and w is
interpolating filter (Interpolator).
An image is recovered (step (4) above) from its RE Laplacian pyramid be reversing these steps. Let
G
ˆ
be the Gaussian recovered from the Laplacian. Reconstruction requires all levels of the RE Laplacian ,
as well as the top level of the original Gaussian pyramid:
K K
G G ·
ˆ
, and for k<K,
2 1
] [ * 4
~
ˆ
↑ +
+ ·
k k k
G w L G (3)
Iterative application of this procedure yields
0
ˆ
G , the reconstructed version of the original image
0
G .
K – the “height” (no. of levels) of the pyramid, is one of the parameters of this algorithm. Its influence
will be described in par. .
Spatial Frequency domain
The Guassian pyramid transform described in eq. (1) and the RE Laplacian pyramid may analogously
be expressed in the frequency domain:
( )

,
_

¸
¸

,
_

¸
¸
·
−
2 2
1
2 2
2 1 2 1
2 1
, ,
4
1
,
w
j
w
j
k
w
j
w
j
jw jw
k
e e G e e W e e G (4)
( ) ( ) ( ) ( )
( ) ( ) [ ] ( )
2 1 2 1 2 1
2 1 2 1 2 1 2 1
, , ,
, , 4 , ,
~
2
2 2
1
jw jw
k
jw jw jw jw
k
w j w j
k
jw jw jw jw
k
jw jw
k
e e G e e W e e G
e e G e e W e e G e e L
− ·
· − ·
+
(5)
Combining Images through Fusion October 99
Eli Shechtman
- 4 -
TAU / Electrical Engineering
The frequency domain transform also is illustrated in the following Figure:

-π π
w
G
0
-π π
w
W
π/2 -π/2
-π π
w
G
1
-π π
w
G
2
-π π
w
Ĺ
1
-π π
w
Ĺ
2
-π π
w
G
0
-π π
w
G
0
3.1.4 FSD Laplacian
Let Lk be the k
th
level of the FSD (filter-subtract-decimate) Laplacian pyramid. This is defined as the
difference between Gk and the filtered copy of Gk prior to sub-sampling to form Gk+1:
k k k k
G w G w G L * ] 1 [ * − · − · (6)
It can be shown that the RE Laplacian can also be derived (approximately) without sub/up-sampling –
k k k k k
G w w G w w G w w G L * ] 1 [ * ] 1 [ * ] * 1 [ * *
~
− + · − · − ≈ (7)
Thus, to good approximation (for practical applications), levels of the FSD Laplacian can be converted
to levels of the RE Laplacian through a simple filter convolution:
k k
L w L * ] 1 [
~
+ ≈ (8)
The practical difference between the RE and FSD Laplacian is shown in Fig. (1b) for level k=2.
Combining Images through Fusion October 99
Eli Shechtman
- 5 -
TAU / Electrical Engineering
3.1.5 Gradient Pyramid
Assume the generating kernel w used in constructing the Gaussian pyramid is the 3 by 3 filter with
binomial coefficients:
16
1
1 2 1
2 4 2
1 2 1
1
1
1
]
1

¸

· w
(9)
Let Dkl be the k
th
level and l
th
orientation gradient pyramid image for I. Dkl is obtained from Gk through
convolution with gradient filter dl:
k l kl
G w d D * ] 1 [ * + · (10)
where
.
2
1
1 0
0 1
,
0 1
0 1
,
2
1
0 1
1 0
,
0 0
1 1
2
3
2
1
1
]
1

¸
−
·
1
]
1

¸
−
·
1
]
1

¸
−
·
1
]
1

¸
−
·
d
d
d
d
Note that Dkl includes also convolution with [1+w] which presents the FSD to RE Laplacian conversion
(according to eq. (8)). The RE Laplacian is needed as intermediate result for reconstruction (according
to eq. (3)).
Also note that the orientation di matrixes were chosen so that:
8
1
) * ' * * ' * (
16
1
1 2 1
2 12 2
1 2 1
1
4 4 3 3 2 2 1 1
d d d d d d d d w + + + − ·
1
1
1
]
1

¸

− − −
− −
− − −
· − (11)
Each gradient pyramid level Dkl is converted to a corresponding second derivative pyramid (or oriented
Laplacian) level
kl
L

:
kl l kl
D d L *
8
1
− ·

. (12)
The oriented Laplacian levels and orientations are represented in Figure (2).
The oriented Laplacian pyramids are then summed to form (according Eq. (7)) an RE Laplacian
pyramid (approx.),
k
L
~
:
∑
·
≈
4
1
~
l
kl k
L L

. (13)
The influence of the FSD to RE transform approximation on the final composite image is shown in Fig.
(3). It can be seen that the reconstructed image (b) is quite identical (by look) to the original one (a),
and the difference can be noticed only in the difference histogram (d).
Combining Images through Fusion October 99
Eli Shechtman
- 6 -
TAU / Electrical Engineering
3.2 Match and Salience measures ( step (2) )
In pattern-selective fusion the composite image is assembled from selected component patterns of the
source images. In the pyramid implementation, the pyramid basis functions serve as the component
patterns.
Here we define two distinct modes of combination: selection and averaging:
At sample locations where the source images are distinctly different, the combination process selects
the most salient component pattern from the source pyramids and copies it to the composite pyramid,
while discarding less salient patterns. As mentioned before the selection mode avoids double exposure
artifacts.
At sample locations where the source images are similar, the process averages the source patterns. The
averaging reduces noise and provides stability where source images contain the same pattern
information.
Pattern selective image fusion is guided by two measures: a match measure that determines the mode of
combination at each sample position (selective or averaging), and salience measures that detemine
which source pattern is chosen in the selection mode.
Salience Measure: The salience of a particular component pattern is high if that pattern plays a role in
representing important information in a scene. There can be various measures of salience such as:
amplitude, contrast or other criteria for the vision task. I define salience at sample m

as a local energy,
or variance, within neighborhood p:
2
,
) , , , ( ) , ( ) ( l k n n m m D n m p m S
I
n m
I
′ + ′ + ′ ′ ·
∑
′ ′

(14)
In practice the neighborhood p is small, typically including only the sample itself (point case) or a 3 by
3 or 5 by 5 array of sample centered on the sample (area case).
Match measure: The match measure is used to determine which of the two combination modes to use at
each sample position, selection or averaging. This measure can be the relative amplitudes of
corresponding patterns in the two source amplitudes. Alternatively, here the match at sample m

is
defined as local normalized correlation within neighborhood p:
) ( ) (
) , , , ( ) , , , ( ) , ( 2
) (
,
m S m S
l k n n m m D l k n n m m D n m p
m M
B A
B A
n m
AB
 

+
′ + ′ + ′ + ′ + ′ ′
·
∑
′ ′
(15)
The neighborhood p may have the same sizes mentioned for the salience measure.
MAB has value 1 for identical patterns, value –1 for patterns that are identical except for an opposite
sign, and a value between –1 and 1 for all other patterns. Unlike the usual definition of normalized
correlation, it has a value less than 1 for patterns that are identical except for a scale factor.
Combining Images through Fusion October 99
Eli Shechtman
- 7 -
TAU / Electrical Engineering
3.3 Combine source pyramids ( step (3) )
The pattern selective step mentioned previously is repeated at each pyramid sample position. The
combination rule (averaging and selection) can be stated as a weighted average in which weights
depend on the match and saliency measures. At each position m

we assign weights wA and wB to the
source images, and define the combined result as:
) ( ) ( ) ( ) ( ) ( m D m w m D m w m D
B B A A C
    
+ · (16)
The functional relationship between the weights and salience and match measures is described in the
following chart:

D
A
( m

),D
B
( m

)
M
AB
≤α M
AB
>α
w
min
=0
w
min
=

,
_

¸
¸
−
−
−
α 1
1
2
1
2
1
AB
M
w
max
=1- w
min
S
A
≥S
B
w
A
= w
max
w
B
= w
min
w
A
= w
min
w
B
= w
max
S
A
<S
B
D
C
( m

)=w
A
( m

)D
A
( m

)+w
B
( m

)D
B
( m

)
This relationship implements the following combination rules:
• If the similarity is low, so the match measure is below a threshold
α, then the weights are 1 and 0 (selection mode).
• If the similarity is high, with correlation near 1, then weights are
both ½ (averaging mode).
• The weights increase and decrease linearly (there are also non-
linear methods) between these two cases.
• In all cases, larger weight is assigned to larger salience value.

Combining Images through Fusion October 99
Eli Shechtman
- 8 -
TAU / Electrical Engineering
3.4 Inverse Transform ( step (4) )
The combination described above implemented on each of the oriented Laplacian pyramids’ levels -
kl
L

, and on the top levels of the Gaussian pyramids - GK (of the two original pyramids).
Reconstruction is completed by converting the composite RE Laplacian pyramid and Gaussian top
level to a complete composite Gaussian pyramid (according to Eq. (3)). Finally the fused reconstructed
image,
0
ˆ
G , is received. Practically the fused image has often gray levels that are beyond the image
format boundaries [0;255] so it has to be normalized.
Combining Images through Fusion October 99
Eli Shechtman
- 9 -
TAU / Electrical Engineering
4 Examples
4.1 Pattern selective fusion vs. combination through pixel averaging
4.1.1 Multi-sensor
Figure (4) shows the fusion of images obtained with different sensors. Fig. (a), (b) show a view of a
field with houses, trees, paths and a large squared (black & white) water tank. The view was pictured
twice: with an IR sensor, in which hotter surfaces are shown lighter and vice versa, and a visible sensor
(CCD). The two images are pre-aligned. In the IR image many details (rocks, bushes, paths and other)
whose temp. is different from their background are much more contrastive than in the visible one. In
the visible image we can see much more clearly objects with high color contrast (like the tank and the
shad beside it). It must be indicated that the pictures were taken in twilight time so the visible image
looks “pale” and only salient objects are seen clearly. Finally it can be seen that each source image
shows certain aspects of the scene that are not visible in the other source.
Image fusion by pyramid based pattern selection (with typical parameters) and by simple pixel
averaging are shown in Figure (5a) and (5b) respectively. Note that the pixel based method has a
“muddy” appearance compared to the pattern based method. This is due primarily to the fact that
averaging results in reduced contrast for all patterns that appear in only one source. Feature contrast is
maintained in the pattern selective approach, and all significant features from both sources are retained
in the composite.
Note: the pattern selection fused image is normalized (as said in ) so it looks a little lighter than the
combined one but still has more contrastive details (as the tank, the shed, the paths and rocks/bushes)
than the averaged image.
4.1.2 Multi-exposure
In this example (Figure (6)) I synthesized a single dragon tale picture into two images with differently
modified contrast and brightness. Figures (a) and (b) has the effect of low and high exposure
respectively. Each image has details that are seen clearly with inner details only in one of them. The
candlelight, the window and some of the books are clearer in the “darker” picture - Fig. (a), while the
wall patterns, the ceiling, the floor, the lattice, some books and dragon’s tail and wings are clearer in
the “lighter” picture - Fig. (b).
Again the pattern selective image (Fig. (c)) includes all significant features from both source images,
while the features in the pixel averaging image are less clear or unobservable at all, like dragon’s tail
and wing details. Fig. (d) also suffers from “double exposure” effect, which is observed near the candle
and the candlelight itself, and also in the bottom inscription.
4.1.3 Multi-focus
Figure (7) shows another implementation of fusion - for extending the effective depth of field of a
camera. A picture of a puma was synthetically blurred twice and two new images were created. In Fig.
(a) puma’s legs and in Fig. (b) it’s head area were respectively gradually blurred. This is an analogue to
taking a picture of the same view with a narrow depth of focus camera, each time with different focus
distance.
This example shows in the most persuasive way the superiority of pattern selective fusion over the
pixel averaging: the image in Fig. (c) is sharp as the original image while the image in Fig. (d) is
completely blurred.
4.1.4 Corrupted image
In Fig. (8) we can see images of an enlarged segment of the view in Fig. (4). This time the visible
image (Fig. (b)) is corrupted at the bottom (lack of info. due to the registration algorithm). The fused
image filled the corrupted area with details from the IR image (accept for a thin stripe near the edge
which remained gray – averaged and not selected), while in the averaged image the corruption is still
existing and annoying.
Combining Images through Fusion October 99
Eli Shechtman
- 10 -
TAU / Electrical Engineering
4.1.5 Opposite contrast
This property of pattern selective fusion advantages is quite rare but still can happen. I talk about a
situation when the image has details with equal salience but opposite contrast (sign). Fig. (9) shows a
artificial image with this problem. The difference between the two fusion techniques is clear…
Note that according the selective pattern algorithm suggested in this paper, the gray level of the
selected area, will always be the lighter one between the two source images, which is not always
preferable.
4.2 Parameters’ sensitivity
4.2.1 K – Number of levels in the pyramid
Figure (10) shows the effect of the number of levels in the decomposition pyramid on the fusion result.
It should be remembered that each level and orientation is fused separately. It can be seen that images
with fewer levels (K=2,3) lack small details which exist in one of the sources to disappear. In K=5 case
it can be noticed, that the averaging effect is dominant than in other cases (especially in the corrupted
area). K=4 is then the optimal choice although all four case give good results.
4.2.2 α - Match threshold value between selection and averaging
Modifications in α are show in Fig. (11). Although that fusion algorithm is remarkably insensitive to
changes in these parameters, it can be slightly noticed that in the α=0 case (and even less noticed in
α=0.4) there is some loss of contrast, due to the increase of the fraction of the component patterns
which are averaged. Choosing α=1 (or high value) may lead an “unstable” image (with “jumps” from
source to source), due to using only selection method. α=0.85 is then selected for all other examples.
4.2.3 p – Size of the neighborhood use for match and saliency computation
The images in Fig. (12) are also extremely insensitive to modifications in p. It is expected that large p
(5 on 5 or larger) might also cause (in sharper images) some loss of contrast, due to the decrease of the
contribution of individual sample value - averaging effect.
The above examples show that although the algorithm is very stable to its parameters’ change, there is
still much work to do, to find the optimal ones for each task.
5 Summary
We have presented a general approach to image fusion and have shown that it can be applied to diverse
fusion tasks: extending the range of a sensor spectral bandwidth, extending the depth of focus of a
camera or extending its dynamic range. Fusion is performed in a pyramid transform domain. The
implementation described here is characterized in two important respects. First, a measure of pattern
match has been introduced to control the mode used in image combination, selection or averaging.
Second, both this match measure and the salience measure are defined as functions of the
neighborhood of each pyramid sample rather than functions of only the sample itself. This algorithm
also provides at least a partial solution to the problem of combining components that have roughly
equal salience but opposite contrast: mismatched patterns always are handled through selection, never
averaging. The fusion algorithm was also found to be remarkably insensitive to changes in its
parameters, suggesting that the procedure is both robust and generic.
It appears that there may be an equally important role for fusion in obtaining images even when the full
signal range of interest can be recorded at once with a single sensor. Multi-spectral sensors (IR and
visible) are now becoming available. It might be assumed that use of such a sensor would eliminate the
need for fusing IR and visible images in our example (). However the image obtained from such a
“broadband” sensor would be similar to the averaged image, and will appear less clear than the fused
one. In short, it may often be possible to obtain higher quality images for analysis by fusing a set of
images obtained under controlled and restricted (narrowband) imaging conditions than by direct image
capture with a broadband sensor. This technique is a relative new and a very powerful tool, which will
probably be used in future image capturing systems.

Bibliography: “Enhanced Image Capture Through Fusion”, Peter J. Burt and Raymond J. Kolczynski, Fourth International
Conference on Computer Vision (Berlin, Germany, May 11-14, 1993).

Image Fusion Technical Report

Comments

Content

Sponsor Documents

Recommended