## Abstract

We present a three-dimensional (3D) object tracking method based on a Bayesian framework for tracking multiple, occluded objects in a complex scene. The 3D passive capture of scene data is based on integral imaging. The statistical characteristics of the objects versus the background are exploited to analyze each frame. The algorithm can work with objects with unknown position, rotation, scale, and illumination. Posterior probabilities of the reconstructed scene background and the 3D objects are calculated by defining their pixel intensities as Gaussian and gamma distributions, respectively, and by assuming appropriate prior distributions for estimated parameters. Multiobject tracking is achieved by maximizing the geodesic distance between the log-posteriors of the background and the objects. Experimental results are presented.

© 2011 Optical Society of America

## 1. INTRODUCTION

Three-dimensional (3D) tracking of multiple objects in a scene is of interest in many areas, including surveillance, robotics, and security. In some cases, objects of interest may be partially occluded, making tracking with two-dimensional (2D) images difficult due to the superposition of occlusion noise and object details. Tracking with 3D imaging offers advantages over 2D imaging systems because of its robustness to object occlusion and the possibility to track multiple objects moving in all 3D coordinates, including range estimation. Also, tracking may need to be robust to variations in object or background features, such as variations in object orientation and scene illumination.

There have been numerous approaches to address detection, recognition, and tracking problems using 3D integral imaging [1, 2, 3, 4, 5, 6, 7] or multiperspective imaging [8], or other approaches. One possible solution is contour-based object tracking [9, 10, 11, 12, 13]. Detection of the object is required for these methods, and then tracking is conducted by moving the previous contour toward the current boundaries. In light of the fact that the active contour method [13] evaluates the changes of local intensities along the boundary; it is limited to small displacements. On the other hand, region-based methods [10, 11] exploit the information of both the object and the background for more robust and flexible performance.

In this paper, we present tracking of multiple occluded 3D objects using a region tracking method based on statistical Bayesian formulation and 3D integral imaging used for passive sensing and computational 3D scene reconstruction. It is assumed that the background is stationary for each frame. We also assume that the reconstructed pixel intensities of both background and multiple objects are independent identically distributed (IID), and they follow Gaussian and gamma distributions based on their grayscale images, respectively. Within the Bayesian framework, posterior probabilities of background and objects are calculated by assuming the appropriate prior distributions for estimated parameters. At each incoming frame, the 3D scene is reconstructed. Then, the objects are located in 2D slices of the 3D reconstructed scene by maximizing the geodesic distance [14] between the log- posteriors of the reconstructed background and objects to be tracked. Then, each object is tracked individually in 3D space by maximizing the above distance across all the 2D reconstructed planes.

In Section 2, we briefly describe the concepts of 3D passive image sensing and visualization. Our statistical Bayesian tracking algorithm is presented in Section 3. The experimental results are demonstrated in Section 4, followed by the summary and conclusion.

## 2. SYNTHETIC APERTURE INTEGRAL IMAGING AND COMPUTATIONAL RECONSTRUCTION

As illustrated in Fig. 1a, a camera array is used to acquire the elemental images from slightly different perspectives with respect to the scene. For computational reconstruction, each elemental image is projected through an associated virtual pinhole array to the desired reconstruction plane, and it is superimposed with other projected elemental images [6]. The 3D scene can be computationally reconstructed plane by plane using this method, as depicted in Fig. 1b. Each elemental image is projected and superimposed by the magnification $M=d/f$, where *d* is the distance from the image sensor to the 3D object and *f* is the focal length of the image sensor, respectively. This enables visualization of the partially occluded objects, because only the reconstruction plane with the object of interest is in focus, while the occlusion and background are out of focus.

## 3. TRACKING WITH THE BAYESIAN ALGORITHM

Object segmentation is applied on the reconstructed im ages for tracking. Reconstructed images are divided into the background region (${\mathbf{\Omega}}_{b}$) and the object region (${\mathbf{\Omega}}_{o}$) [15, 16, 17]. Our goal is to find the object region ${\mathbf{\Omega}}_{o}$ (in a statistical sense), matching the object support. We assume that the background and the objects are statistically independent and that the background is stationary for each frame. The objects’ pixel intensities are usually correlated. However, for simplicity, we assume that the pixel intensities of both the reconstructed background and the reconstructed objects are unknown, in dependent, and follow Gaussian and gamma distributions, respectively. We will present experimental results in Section 4 illustrating the tracking performance under these assumptions. In the following derivations, for simplicity, one-dimensional notations are used for the signals as $\mathbf{s}=\{{s}_{i}|i\in [1,N]\}$, where *N* is the total number of pixels. Let $\mathbf{w}=\{{w}_{i}|i\in [1,N]\}$ be a binary window that defines a support for objects, such that ${w}_{i}=1$ for object pixels (denoted as **o**), and ${w}_{i}=0$ for background pixels (denoted as **b**). The purpose of segmentation is to estimate the window function **w** for objects of interest in the reconstructed scene. Thus, each point on the reconstruction can be modeled as a spatially disjoint combination of object and background as

Several optimal criterion laws [18] have been derived for situations that the statistical properties of object and background are from the exponential family (i.e., gamma or Gaussian). In 3D integral imaging reconstruction of the scene, the optical rays generated by elemental images are superimposed. Thus, the background region of the reconstructed images tends to be Gaussian distributed by applying the central limit theorem. The statistical behavior of various objects may be different. Therefore, a gamma distribution is chosen as a robust statistical distribution to capture the object pixel distributions. By adjusting its parameters, we can approximate various distributions. Also, the object statistics at different times (frames) or at different object poses or orientations may vary. The gamma distribution parameters can be estimated to capture such variations of the object.

#### 3A. Background Region Statistics

By assuming a Gaussian distribution for the background region, one can write the probability density function (PDF) as follows:

*μ*and ${\sigma}^{2}$ are the mean and the variance for the background region, respectively.

We estimate the unknown variables *μ* and ${\sigma}^{2}$ by maximizing the conditional probability ${\mathbf{P}}_{b}(\mu ,{\sigma}^{2}|\mathbf{w},\mathbf{s})$:

*a posteriori*(MAP) sense. According to Bayes’s rule [19], the conditional probability can be rewritten as

We assume that ${P}_{b}(\mu )$ is uniformly distributed and ${P}_{b}({\sigma}^{2}|\mu )\propto 1/{\sigma}^{2}$. Taking Eq. (1) into account, we write the likelihood ${P}_{b}(\mathbf{s}|\mathbf{w},\mu ,{\sigma}^{2})$ as follows:

In order to derive the MAP of unknown parameters, one has to take partial derivatives of the log-posterior function with respect to *μ* and ${\sigma}^{2}$, and set each to zero as follows:

*N*is the total number of input image pixels.

#### 3B. Object Region Statistics

By assuming an IID gamma distribution for the pixels of the multiple 3D objects to be tracked, one can write the PDF of the object *j* as follows:

*j*denotes the index of objects to be tracked, $\mathrm{\Gamma}(\xb7)$ is the gamma function, and $\alpha >0$, $\beta >0$.

We assume that the shape parameter *α* is known, and that the rate parameter *β* has known gamma prior distribution, $\pi (\beta )\equiv \text{gamma}({\alpha}_{0},{\beta}_{0})$ whose ${\alpha}_{0}$ and ${\beta}_{0}$ are known. $({\alpha}_{0},{\beta}_{0})$ are selected based on the types of objects and scenes used in the experiments. One can derive the *posterior* distribution of each object region:

The posterior distribution of the object region is also gamma distributed; however, with different parameters. The Bayes’ estimator of *β* under the squared error loss is achieved as the posterior mean [19]:

#### 3C. 3D Tracking with the Bayesian Algorithm

The tracking of objects can be modeled as an estimation problem. Our objective is to estimate the object subspace in a 3D stack of reconstructed planes obtained by integral imaging. Assume that the initial positions of our objects are located in some regions with unknown reconstructed planes in the 3D space. The objects are tracked individually, because they may be located at different depths. For the first frame, starting with an arbitrary reconstructed plane *p* between the occlusion and the background, we are first seeking to locate the objects individually, which is analogous to maxi mizing the geodesic distance [14] of the object *j* and the background:

*p*is the reconstructed plane and

*j*is the index of ob jects to be tracked. Thus for an object

*j*at the reconstructed plane

*p*,

**w**is the optimal binary window and the optimal segmentation.

Equation (10) can be interpreted as the snake energy [13]. The maximization is done by applying a level-set method [20]. Let $\mathrm{\Gamma}(t)$ be the object surface at time *t*, the embedding level-set is defined as $\phi (\mathbf{q},t)$, where **q** denotes a point in the level-set, such that $\phi (\mathbf{q},t)<0$ represents the object region, and $\phi (\mathbf{q},t)>0$ represents the background region. Our object surface can be explicitly written as $\mathrm{\Gamma}(t={t}_{0})=\{\mathbf{q}|\phi (\mathbf{q},t={t}_{0})=0\}$. It can be shown [10, 20] that if each point propagates to the interface ($\stackrel{\rightharpoonup}{n}={\nabla}_{\phi}/|{\nabla}_{\phi}|$), then the evolution of $\mathrm{\Gamma}(t)$ can be modeled as a discrete space–time partial differential equation, $\phi (\mathbf{q},t+1)=\phi (\mathbf{q},t)+F(\mathbf{q})\Vert {\stackrel{\rightharpoonup}{\nabla}}_{\phi}\Vert $, where $F(\mathbf{q})$ represent the speed function.

Thus the maximization problem is analogous to computing the derivatives of ${\epsilon}_{pj}(\mathbf{s},\mathbf{w})$ with respect to **s**. The corresponding Euler–Lagrange equation result is $\partial {\epsilon}_{pj}(\mathbf{s},\mathbf{w})/\partial \mathbf{s}=({\epsilon}_{pj}(\mathbf{s},\mathbf{w}))\stackrel{\rightharpoonup}{n}$, where $\stackrel{\rightharpoonup}{n}$ is the outward normal to the object surface. By following Ref. [10], the speed function can be rewritten as

Then each object is tracked individually in 3D space by maximizing the distance in Eq. (10) across all the reconstructed planes of interest:

## 4. EXPERIMENTAL RESULTS

For optical experiments, two cars with unknown position, rotation, illumination, and the presence of unknown occlusion and background are used as objects to be tracked [see Fig. 2a]. Objects are shown in Fig. 2b, and the background is shown in Fig. 2c. Elemental (multiview) images for this scene are captured as illustrated in Fig. 1. Each elemental image has $2784\times 1856$ pixels. Sample elemental images with various perspectives are shown in Fig. 2d. A camera lens with $f=50\text{\hspace{0.17em}}\mathrm{mm}$ is used. Our camera is located at distance $Z=0\text{\hspace{0.17em}}\mathrm{mm}$; the occlusion is located at $Z=190\text{\hspace{0.17em}}\mathrm{mm}$; the two cars to be tracked are initially located at $Z=380\text{\hspace{0.17em}}\mathrm{mm}$ and $Z=410\text{\hspace{0.17em}}\mathrm{mm}$, respectively, and the background is located at $Z=690\text{\hspace{0.17em}}\mathrm{mm}$. The two objects (cars) are moved randomly, and 60 positions for both objects are recorded. The 3D movements of the objects are shown in Fig. 3.

Our background region is modeled as Gaussian distribution with mean *μ* and variance ${\sigma}^{2}$, whose values can be estimated from Eq. (6). Those estimates vary and depend on the scene and on the illumination conditions. For the object region, *α* may be different from object to object, and usually varies between 2 and 5. For our case, we assume that $\alpha =2.3$ from experiments. In our experiment, *β* follows $\text{gamma}({\alpha}_{0},{\beta}_{0})$, with ${\alpha}_{0}=2$ and ${\beta}_{0}=0.05$. The estimate of *β* varies in between 0.015 to 0.05 from frame to frame, because the pixel sta tistics change for different object positions, rotations, and illuminations.

The tracking of heavy occluded objects is usually difficult. Also, changes in rotation or illumination for the objects add more complexity to this problem. Consider that two objects [see Fig. 2b] move and rotate in between the occlusion and the background from frame to frame with varying illumination. In general, 2D algorithms may fail to track in this case. Experimental results with the 2D optimal object tracking algorithm presented in [17] are shown in Fig. 4. It can be seen that the performance using the 2D imaging approach is quite poor and the objects cannot be tracked. However, we show that our 3D tracking method performs reasonably well for this scene with changes in the orientation and illumination of the object. Reconstruction from elemental images for the first frame by using 3D computational method is shown in Fig. 5. For comparison, tracking with and without occlusion is performed. The performance for the first frame is shown in Fig. 6 (Media 1, Media 2, Media 3, and Media 4). Tracking examples of both cars are shown in Fig. 7 (Media 5 and Media 6). 3D tracking experiments are performed with varying orien tation and illumination. Illumination is reduced by half in Fig. 7b for the tracking experiments, and car 2 and the opposite side of car 1 are tracked. Both cars are rotated in the tracking results in Fig. 7c. Car 2 is rotated for tracking results in Fig. 7d. Illumination is doubled, and car 1 is rotated by 135 degrees for tracking results in Fig. 7e.

## 5. CONCLUSIONS

We have presented a Bayesian framework for tracking multiple objects in 3D space using a region tracking method based on statistical Bayesian formulation and 3D integral imaging. The proposed method is robust to partial occlusion and an unknown background scene, and it works with objects with unknown position, range, rotation, scale, and illumination. In the proposed tracking algorithm, the reconstructed pixel intensities of the background and the objects are assumed to follow Gaussian and gamma distributions, respectively. By assuming appropriate priors, posterior distributions of the background and the objects can be calculated. Multi object tracking is achieved by maximizing the geodesic distance between the log-posteriors of the 3D reconstruc ted background and the objects. We have shown that statistical Bayesian formulation used with 3D integral imaging provides a promising technique for tracking objects in the 3D space.

## ACKNOWLEDGMENTS

We wish to thank Dr. M. Daneshpanah and Prof. Dipak Dey for many useful discussions. This work was supported by the Defense Advanced Research Projects Agency (DARPA) and by the Air Force Research Laboratory under FA8650-07-C-7740.

**1. **G. Lippmann, “La photographic intégrale,” C. R. Acad. Sci. **146**, 446–451 (1908).

**2. **A. Stern and B. Javidi, “3D image sensing, visualization, and processing using integral imaging,” Proc. IEEE **94**, 591–608 (2006). [CrossRef]

**3. **F. Okano, J. Arai, K. Mitani, and M. Okui, “Real-time integral imaging based on extremely high resolution video system,” Proc. IEEE **94**, 490–501 (2006). [CrossRef]

**4. **J. S. Jang and B. Javidi, “Three-dimensional synthetic aperture integral imaging,” Opt. Lett. **27**, 1144–1146 (2002). [CrossRef]

**5. **S. Hong and B. Javidi, “Improved resolution 3D object reconstruction using computational integral imaging with time multiplexing,” Opt. Express **12**, 4579–4588 (2004). [CrossRef] [PubMed]

**6. **B. Javidi, R. Ponce-Diaz, and S.-H. Hong, “Three-dimensional recognition of occluded objects by using computational integral imaging,” Opt. Lett. **31**, 1106–1108 (2006). [CrossRef] [PubMed]

**7. **M. Cho and B. Javidi, “Three-dimensional tracking of occluded objects using integral imaging,” Opt. Lett. **33**, 2737–2739 (2008). [CrossRef] [PubMed]

**8. **M. Pollefeys, L. Van Gool, M. Vergauwen, F. Verbiest, K. Cornelis, J. Tops, and R. Koch, “Visual modeling with a hand-held camera,” Int. J. Comput. Vis. **59**, 207–232 (2004). [CrossRef]

**9. **M. DaneshPanah and B. Javidi, “Segmentation of 3D holographic images using bivariate jointly distributed region snake,” Opt. Express **14**, 5143–5153 (2006). [CrossRef] [PubMed]

**10. **M. DaneshPanah and B. Javidi, “Tracking biological microorganisms in sequence of 3D holographic microscopy images,” Opt. Express **15**, 10761–10766 (2007). [CrossRef] [PubMed]

**11. **C. Chesnaud, V. Page, and P. Réfrégier, “Improvement in robustness of the statistically independent region snake-based segmentation method of target-shape tracking,” Opt. Lett. **23**, 488–490 (1998). [CrossRef]

**12. **A. Yilmaz, X. Li, and M. Shah, “Contour based object tracking with occlusion handling in video acquired using mobile cameras,” IEEE Trans. Pattern Anal. Mach. Intell. **26**, 1531–1536 (2004). [CrossRef] [PubMed]

**13. **M. Kass, A. Witkin, and D. Terzopoulos, “Snakes: active contour models,” in Proceedings of the International Conference on Computer Vision (IEEE, 1987), pp. 259–268. [CrossRef]

**14. **T. Georgiou, “Distances and Riemannian metrics for spectral density functions,” IEEE Trans. Signal Process. **55**, 3995–4003 (2007). [CrossRef]

**15. **O. Germain and P. Réfrégier, “Optimal snake-based seg mentation of a random luminance target on a spatially disjoint background,” Opt. Lett. **21**, 1845–1847 (1996). [CrossRef] [PubMed]

**16. **B. Javidi, P. Réfrégier, and P. Willett, “Optimum receiver design for pattern recognition with nonoverlapping signal and scene noise,” Opt. Lett. **18**, 1660–1662 (1993). [CrossRef] [PubMed]

**17. **F. Goudail and P. Réfrégier, “Optimal target tracking on image sequences with a deterministic background,” J. Opt. Soc. Am. A **14**, 3197–3207 (1997). [CrossRef]

**18. **C. Chesnaud, P. Réfrégier, and V. Boulet, “Statistical region snake-based segmentation adapted to different physical noise models,” IEEE Trans. Pattern Anal. Mach. Intell. **21**, 1145–1157 (1999). [CrossRef]

**19. **N. Mukhopadhyay, *Probability and Statistical Inference* (Marcel Dekker, 2000).

**20. **J. Sethian, *Level Set Methods: Evolving Interfaces in Com putational Geometry, Fluid Mechanics, Computer Vision, and Material Sciences* (Cambridge University Press, 1999).