Detection and Tracking of Moving
Objects from a Moving Platform
Gérard Medioni
Institute of Robotics and Intelligent Systems
Computer Science Department
Viterbi School of Engineering
University of Southern California
Problem Definition
• Scenario: rigidly moving objects + moving camera
• Goal
• Motion segmentation: motion regions / background area
• Tracking of multiple objects: consistent track(s) over time
• Geo-registration and Geo-tracking: Geo-referenced mosaic and tracks
Scenario example 1 – moving cameras
Moving cameras
Image stabilization
Motion
segmentation
Tracking
Mosaic+Tracks
+Tracks
Mosaic
Scenario example 2 - moving cameras with a map
Moving camera
Map
Image stabilization
Geo registration
Motion Segmentation
• Two-frame pixel-level segmentation?
• Segmentation within a temporal window
• Accumulate the pixels warped from adjacent frames
• K-Means to find the most representative pixel
• Frame differencing and thresholding: |Ioriginal-Imodel|>ΔI
Frame t
Frame t-w
t: reference frame
w: half size of the window
Frame t+w
10/72
Experimental Results (1)
Original
Images
Motion
Prob.
Maps
Initial
Detection
Results
Tracking
Results
11/72
Experimental Results (2)
Original
images
Motion
Prob.
Maps
Initial
Detection
Results
Tracking
Results
Experimental Results (3)
A synthesized video without motion regions
Outline
Introduction
2D Motion segmentation
Tracking of multiple moving objects
• Geo-registration and geo-tracking
• Summary and Discussion
Problem statement- multiple target tracking
• Input: foreground regions in each frame
• Output: trajectories with consistent track IDs
• Challenges:
• Noisy foreground regions
• Occlusions
Problematic underlying assumption
• One-to-one assumption
• One target can correspond to at most one observation
• One observation can be associated to at most one target
• Appropriate to punctual observations
• Underlying one-to-one assumption may not stand for visual tracking
Radar
UAV camera
Stationary camera
Related work
• MAP, multi-scan, uniform prior (no missing or false detection)
• (Cong et al., 04) Approximate association probabilities in JPDAF
• MMSE, MCMC outperforms JPDAF, one-scan/muliti-scan
• (Sastry, et.al 04) MCMC to compute joint DA with unknown number of
targets
• MAP, multi-scan, outperforms MHT, consider temporal association only
• (F.Dellaert et.al 03) MCMC to SfM without correspondence
• MMSE, Single scan, similar to JPDAF
• Our method: overcome the one-to-one assumption
• MAP, multi-scan, consider both spatial and temporal association
One-to-one assumption
• (Pasula et al., 99) Gibbs sampling to compute joint DA
Anatomy of the problem
• “Explain” foreground regions:
•It is hard at one frame without using any model information
•It is solvable if smoothness in motion and appearance is used
Explanation of foreground regions
• Two way of explain foreground regions
Precisely
Approximately
Labeling of foreground regions
• The label(s) of a pixel indicates the
track ID
• Each pixel can have multiple labels
to represent occlusions
• Accurate but expensive!
Cover of foreground regions
• A set of shapes (rectangles)
• Each rectangle can have overlap
with others to represent occlusions
• Approximate but Efficient!
Our formulation
• Given
• A set of noisy observations (foreground regions)
• Find
• A cover ω of foreground regions over time
τ
k
is a sequence of shapes (rectangles)
Solution space
• Solution space Ω is a collection of spatio-temporal covers of
observation Y.
• “Joint association event”
ω = {τ 1 ,τ 2 K,τ K }
• Two kinds of data association
• Spatial data association - change the cover at one instant
• Temporal data association - form consistent tracks
• Uncovered area belongs to false alarms
(a) Observations Y
(b) One possible cover of Y
Bayesian formulation
• MAP estimate
ω* = arg max( p(ω | Y ))
p (ω | Y ) ∝ p (Y | ω ) p (ω )
Prior model p(ω)
• Few number of long tracks
• One track should have little overlapping with other track unless necessary
p(ω ) = p ( L) p( K ) p(O)
• Likelihood p(Y | ω)
• Smoothness in both motion and appearance
• Areas of uncovered false alarms p(F)
K |τ k |−1
p (Y | ω ) = p ( F )∏ ∏ L(τ k (ti +1 ) | τ k (ti ))
k =1 i =1
Motion likelihood
Appearance likelihood
Motion and appearance likelihood
• Motion
xtk+1 = Ak xtk + w
y = H x +v
k
t
k
k
t
w ~ N (0, Q)
v ~ N (0, R)
τk (ti+1)
τk (ti+1)
• Appearance
LM (τ k (ti +1 ) | τ k (ti )) ≡ p(τ k (ti +1 ) | τ k +1 (ti ))
LA (τ k (ti +1 ) | τ k (ti )) = (1/ z3 ) exp ( −λ3 D(τ k (ti ),τ k (ti +1 ) )
D (τ k (ti ),τ k (ti +1 )
Kullback- Leibler (KL)
distance between two RGB
color histograms
MAP of full posterior p(ω |Y)
• MAP estimate of such a posterior is not a trivial task
• Even to determine the parameters in such a posterior is not an
easy task
p(ω | Y ) ∝ exp {C0 Slen − C1 K − C2 F − C3 Solp − C4 S app − Smot }
MAP is equivalent to minimize an energy function.
• Solution to MAP:
• Sampling based method to avoid enumerating all possible solutions
• Two types of proposal moves (temporal and spatial moves)
• Symmetric temporal information
Markov Chain Monte Carlo
• Basic idea: construct a Markov chain which will converge to
the target distribution
• State of the Markov chain is defined in Ω
• Transition of the Markov chain is guided by a proposal distribution
• Metropolis-Hasting algorithm
• Propose a new state ω’ from the previous state ω(i)
ω ' ~ q(ω ' | ω (i ) )
• Accept ω’ with probability
•
p(ω ')q(ω ( i )
| ω ')
min 1,
(i )
(i )
p (ω )q (ω ' | ω )
• Properties
• Don’t have to compute the global p(ω), but the local ratio p(ω’)/ p(ω)
• For MAP, don’t have to keep the whole chain, but the current state and the
best one
Metropolis-Hasting algorithm
1. Initialize ω (0) .
2. For i = 0 to N -1
N is the length of Markov chain
- Sample u ∼ U [0,1]
- Propose ω ' ∼ q(ω ' | ω (i ) ).
q() is called the proposal distribution
(i )
p
(
ω
')
q
(
ω
| ω ')
(i )
- Compute A(ω , ω ')= min 1,
(i )
(i )
ω
ω
ω
p
(
)
q
(
'
|
)
- If u < A(ω ( i ) , ω ')
else
ω ( i +1) = ω '
ω (i +1) = ω (i )
Endfor
The chain {ω (0) , K , ω ( N ) }N →∞ → p(ω )
Two types of q(ω’ | ω)
• Temporal moves and spatial
Birth/Death
• Data-driven proposal
q(ω ' | ω ) → q(ω ' | ω , D)
• Spatial moves are made only after
Temporal Moves
moves to drive the Markov chain
enough temporal information is
Extension/
Reduction
Split/Merge
Switch
• Symmetric temporal information
•
Forward and backward (e.g. extension)
•
Deal with occlusions at the very
beginning
Spatial Moves
collected
Segmentation
/Aggregation
Diffusion
MCMC Data Association
1. Initialize ω (0) .
2. For i = 0 to N -1
- Sample u ∼ U [0,1]
- Sample if i < ε ⋅ N , ω ' ∼ qTemporal (ω ' | ω ( i ) )
else
ω ' ∼ qAll (ω ' | ω (i ) ).
(i )
p
(
ω
')
q
(
ω
| ω ')
(i )
- Compute A(ω , ω ')= min 1,
(i )
(i )
p(ω )q(ω ' | ω )
- If u < A(ω ( i ) , ω ')
else
Endfor
ω ( i +1) = ω '
ω (i +1) = ω (i )
Determining Parameters
• Determine the parameters in the full posterior
• Casual setting makes ground truth p(ωgt|Y) even much lower than the
“solution”.
• Take advantage of the property of MCMC
p (ω | Y ) ∝ exp {C0 Slen − C1 K − C2 F − C3 Solp − C4 S app − S mot }
Degenerate the ωgt to ω’
p(ω gt )
p (ω ')
A [C0 , C1 , C2 , C3 , C4 ] ≤ b
⇒ C0 , C1 , C2 , C3 , C4 ≥ 0
max(C + C + C + C + C )
0
1
2
3
4
≥1
Linear Programming to solve it
(GNU Linear Programming Kit)
Simulation experiments
• Settings
•
•
•
•
K (unknown number) moving discs in 200x200
Independent color appearance and motion
Static occlusion and inter-occlusion
False alarms
MHT (I. Cox94), JPDAF (J.Kang03), Temporal only
STDA score in VACE-II eval
Same motion and appearance likelihood
Average of multiple sequence and multiple runs
FA=0, W=50, 10K MCMC iterations
K=5, W=50, 10K MCMC iterations
Simulation experiments
• Online implementation
• Sliding window W
• Initialize ωt with ω*t-1
Online vs. offline comparison T=1000
Real Scenarios
Experiments
CLEAR 320x240
Vivid-II 320x240
Experiments
• Can handle occlusion at the beginning by using symmetric
temporal information
Outline
Introduction
2D Motion segmentation
Tracking of multiple moving objects
Geo-registration and geo-tracking
• Summary and Discussion
Geo-registration
• Use 2D homography to
compensate inter-frame (2-
…
…
−1
H i +1, M = ( H i ,i +1 ) H i , M H update
view) motion
Hi,i+1
Hi,M
Hi+1,M
Hupdate
• Refine the homography
between map and images
37/72
Geo-registration results
Geo-mosaicing 2000 frames on top of the reference frame.
Experimental results
• Results are shown on two UAV data sets
• Map is acquired from Google Earth®
• Geo-registration is performed every 50 frames
• Local data association (MCMCDA) window 50 frames
Geo-registration
Without geo-refinement
With geo-refinement
Experimental results
Experimental results
System implementation
• C++ implementation
• Xeon Dual Core P4 3.0GHz
• Preliminary time performance
Procedure
Time (seconds) on 320x240
Image registration
~ 0.25
Motion detection (moving cameras)
~ (2 / 0.1) (CPU / GPU)
Object detection after motion
segmentation
~0.25
Geo-registration
~ 6 every 50 frames
Tracking
~ 0.4
Total
~ 1 ( GPU)
43/72
Outline
Introduction
2D Motion segmentation
Tracking of multiple moving objects
Geo-registration and geo-tracking
Summary and Discussion
Summary & Discussion
• Detection and tracking in dynamic scene
•
•
•
•
Moving camera + rigid moving objects
2D motion segmentation and geometric analysis of background
Spatial and temporal (2D+t) data association of moving objects
Tracking with Geo-registration
• Highlights
• Solution to practical problems in detection and tracking area
Qian Yu and Gérard Medioni, “A GPU-based implementation of Motion Detection from a
Moving Platform”, to appear in IEEE workshop on Computer Vision on GPU, in conjunction
with CVPR’08
•
Qian Yu and Gérard Medioni, “Integrated Detection and Tracking for Multiple Moving
Objects using Data-Driven MCMC Data Association,” IEEE Workshop on Motion and Video
Computing (WMVC'08), 2008
•
Qian Yu, Gérard Medioni, Isaac Cohen, "Multiple Target Tracking Using Spatio-Temporal
Monte Carlo Markov Chain Data Association" IEEE Conference on Computer Vision and
Pattern Recognition, 2007 (CVPR'07), pp.1-8
•
Qian Yu, Gérard Medioni, "Map-Enhanced Detection and Tracking from a Moving Platform
with Local and Global Data Association," IEEE Workshop on Motion and Video Computing
(WMVC'07), 2007
•
Yuping Lin, Qian Yu, Gerard Medioni "Map-Enhanced UAV Image Sequence Registration"
Workshop on Applications of Computer Vision (WACV'07), 2007
•
Qian Yu, Isaac Cohen, Gérard Medioni and Bo Wu "Boosted Markov Chain Monte Carlo
Data Association for Multiple Target Detection and Tracking," Proceedings of the 18th
international Conference on Pattern Recognition (ICPR'06), Vol. 2, pp. 675-678.