172
IEEE SIGNAL PROCESSING LETTERS, VOL. 21, NO. 2, FEBRUARY 2014
3-D Face Recognition Using Curvelet Local Features S. Elaiwat, Elaiwat, M. Bennamoun Bennamoun,, F. Boussaid, Boussaid, and A. El-Sallam El-Sallam
I n this this letter letter,, we pre present sent a robust robust single modality modality Abstract— In feature-based algorithm for 3-D face recognition. The proposed prop osed algorithm exploits Curvelet transform not only to detect salient points on the face but also to build multi-scale local surface descriptors that can capture highly distinctive rotation/displacement invariant local features around the detected keypoints. This approach is shown to provide robust and accurate recognition under varying illumination conditions and facial expressions. exp ressions. Using the well-known and challenging FRGC v2 dataset, we report a superior performance compared to other algorithms, with a 97.83% verification rate for probes with all facial expressions. expressions. Digital Index Terms— Terms— Digital
curvelet transform, face face recognition, local
features.
I. I NTRODUCTION
D
ESPITE decades of research res earch efforts, 2-D recognition systems remain sensitive to variations in illumination, pose and facial expressions. The use of commercially available 3-D imaging devices has shown to overcome sensitivity to illumination and pose [1]. In addition, 3-D face recognition exploits structural information about the face such as geodesic distances and surface curvatures. cu rvatures. However, variations in facial expressions still constitute a major challenge because they can result in important geometrical facial changes [1]. Reported ap proaches to the problem of face recognition can be classi fied into three categories [2]: (i) Holistic matching algorithms which use the whole face region for recognition, (ii) Local feature based matching matching algorithms which extract local features from some facial regions (e.g. eyes and nose), (iii) Hybrid matching algorithms which exploit the combination of the holistic and the local feature-based matching, but at the expense of a greater computational cost. Among these approaches, the local feature based matching algorithms are potentially the most effective, i n that, that, they can exclude those facial regions that could be most affected by perturbations such as changes in facial expression or spurious elements. They are also robust to occlusion and clutter [3]. Their performance depends on their ability to extract distinctive local facial features. Curvelet theory provides a pow-
Manuscript received August 29, 2013; revised October 26, 2013; accepted December 06, 2013. Date of publication December 13, 2013; date of current versio version n Januar January y 03, 2014. This This work work was was suppor supporte ted d by the the ARC under under DP110102166. DP110102166. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Jing-Ming Guo. S. Elaiwat, M. Bennamoun, and A. El-Sallam are with the School of Com puter Science and Software Engineering, The University of Western Australia, Perth, Australia (e-mail:
[email protected]). F. Boussaid is with the School of Electrical, Electrical, Electronic Electronic and Computer Engineering, The University of Western Australia, Perth, Australia. Color versions of one or more of the fi gures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identi fier 10.1109/LSP.2013.2295119
erful framework to extract such key local facial features. Unlike the isotropic elements of other transforms such as wavelets, the needle-shaped needle-shaped element elementss of the Curvelet Curvelet transform has has a very high directi directiona onall sensitiv sensitivity ity and anisotr anisotropy opy,, making making it well well suited suited for curvature curvature representation representation.. Other directional directional transforms trans forms such as Gabor wavelets and Dual-Tree Complex Wavelet Transform (DTCWT) only cover part of the spectrum in the frequency frequency domain. On the other hand, the Ridgelet transform is only applicable to objects with global straight-line singularities, which are rarely observed in real applications [4]. The main contri butions of this paper lie l ie in i n exploiting the C urvelet urvelet transform in two novel ways: A) As a keypoi keypoint nt detector detector to extract extract salient salient points on the face. The identi fication of the keypoint is undertaken in the Curvelet domain by examining Curvelet coef ficients in each subband. Given that a high Curvelet coef ficients is indicative of high global varia tions at coarse resolutions and of high local variations at fine resolutions, the idea is to obtain dominant coef ficients associated to the mid-bands of the Curvelets. This is done so as to lessen the bias introduced by noise, face boundary endpoints or other spurious elements. Because each keypoint keypoint is represented by a scale, orientation, orienta tion, Curvelet position, spatial position and magnitude, m agnitude, th e identi fication of the detected detected keypoin keypoints ts is robust robust and repeat repeatabl ablee in the Curvele Curvelett domain. domain. Given that our keypoint detector is based on anisotropically scaled basis functions, it can capture capture a large variety variety of geometr geometrica icall feature features, s, includin including g curves curves and edges. edges. Other Other keypoint detectors such as Difference of Gaussian “DoG” in SIFT act as “isotropic” “isotropic” fi lters, thereby requiring an additiona ditionall step to detect detect edges edges (used (used to detect detect keypoin keypoints) ts) [5]. B) As multi-scale multi-scale local surface surface descriptors descriptors that can extract extract highly distinctive features around the detected keypoints. Unlike previously reported reported Curvelet Curvelet based works ([6], [7]), which extract global features (e.g. PCA, entropy, standard deviation and mean) from an entire 2-D face, our descriptor operates on depth facial images only. In addition, it exploits the Curvelet decomposition (scale, orientation and Curvelet position) to construct construct highly descriptive local (rather than global) features around the detected keypoints of all subbands. To overcome the sensitivity of Curvele Curvelets ts to rotation rotation,, the element elementss (descr (descripto iptors) rs) of the feafeature vectors are reordered (reoriented) based on the orientation of the detected keypoints, thus resulting in rotation invariant local features. Because our descriptor is constructed in the Curvelet domain, the accurate localization of the the keypoints is not required. In contrast, other key point detectors require an additional step to invert back accurately to the spatial domain [5].
1070-9908 © 2013 IEEE. Personal use is permitted, but republication/re republication/redistribut distribution ion requires IEEE permission. permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
ELAIWAT et al.: 3-D FACE RECOGNITION USING CURVELET LOCAL FEATURES
173
found by examining all keypoints in tial position than keypoint imum magnitude:
that share the same spa. refers to the max-
(5)
Fig. 1. Illustration of a 4 scales Curvelet decomposition.
II. ALGORITHM D ESCRIPTION A. Keypoint detector The Discrete Curvelet transform decomposes each image into a set of frequency and angle decomposition components , as shown in Fig. 1. The angle decomposition process produces identical subbands coef ficients at angle and for the same scale. As a result, only half of the subbands needs to be considered. Given a 3-D face image (depth image), the face Curvelet coef ficients are determined using the Fast Discrete Curvelet Transform (FDCT) de fined by [8], (1)
where is a discrete Curvelet coef ficient at scale , sub band at position , while and are the discrete mother Curvelet and the 2-D array (e.g. image), respectively. To extract keypoints from the Curvelet coef ficients (Fig. 2(b)), a keypoint selection measure based on the coef ficients’ magnitude is used in each scale: (2) where is the set of keypoints at scale , while represents the mean value of all coef ficients at scale . The mean value has previously successfully been applied, as a threshold, on Curvelet transform for feature extraction and coef ficients fi ltering [9]. This work uses the mean value for keypoint selection (Fig. 2(c)). However, since the keypoints are extracted from all subbands of a given scale, several keypoints could invert back to the same spatial location or region. This would result in redundant features. In order to address this, a keypoint in set is discarded if: (3) where
found by examining all keypoints present in a patch centered around spatial position . The weighting factor in Eq. (3) was set to 80% to keep only the most signi ficant keypoints in each patch. Fig. 2(b) and Fig. 2(c) illustrate the keypoint detector algorithm at a given scale . In our experiments, we ap plied four scales Curvelet transform. The coarsest ( first scale) and the fi nest (last scale) were discarded because (i) both do not have an angle decomposition and (ii) information in the lowest (flat surfaces) and the highest (noise and image boundaries) frequency bands are rarely signi ficant. Furthermore, all Curvelet coef ficients which are falling on the face boundaries and mouth area, including chin/beard, (areas below nose) were automatically excluded as they can be affected by changes in facial ex pression. This was done by creating a reference line below the nose tip to exclude the Curvelet coef ficients belonging to the mouth and chin/beard. B. Local Feature Representation Once the dominant keypoints are identified (detected), unique multi-scale local surface descriptors are extracted around each of these keypoints, considering each subband of a given scale (Fig 2(d)). To explain how these descriptors (features) are extracted, let us consider one of these keypoints, . Let be the corresponding position of the keypoint in each subband . The fi rst step is to extract sub-patches around each position in order to form a 3-D patch defined as (6) where is a discrete Curvelet coef ficient at scale , sub band , and position . The size of the first dimension in represents the number of subbands in scale . The other two dimensions represent the size of the sub-patches, which determines the degree of locality of the features. A smaller sub patch results in weak descriptiveness, while a larger sub-patch exhibits a higher sensitivity to facial expressions. Experimental tests conducted with varying sub-patch sizes have shown that a sub-patch size offers the best trade-off between descriptiveness and sensitivity to facial expressions. This was true for both scales 2 and 3. Extracting sub-patches at different scales using an adaptive patch size is also possible but at the cost of a more complex matching process since feature vectors will be of different sizes. After extraction, each 3-D patch is scaled by a weighting Gaussian window ,
(7)
is the maximum magnitude: (4)
where is the euclidean distance of a coef ficient to the keypoint, thus more emphasis is given to the keypoint. The size of the
174
IEEE SIGNAL PROCESSING LETTERS, VOL. 21, NO. 2, FEBRUARY 2014
Fig. 2. Block diagram of keypoint detection and feature extraction algorithm (best seen in color): (a) Curvelet Transform (b) Keypoint Detection at scale (c) Keypoints selection and spatial representation and (d) Feature extraction around keypoint .
Fig. 3. Verification and identi fication results for the FRGC v2 dataset.
sub-patch was fi xed to 5 while the optimal value of the parameter was determined experimentally to be for both scales 2 and 3. The scaled patch is defined by, (8) Building a feature vector directly from the resulting scaled sub patches is not advisable because Curvelet coef ficients are sensitive to rotation. To address this issue, we propose to first reorder each scaled patch w.r.t the keypoint orientation using circular shift: (9)
where represents the keypoint sub-patch ( written in bold to illustrate the shift). This is done so as to keep the keypoint sub-patch always at a fixed position (e.g. second position in Eq. (9)) and preserve the relative subbands order. This ensures rotation invariance for each feature vector , which is
constructed by concatenating all elements of a patch into one vector. This feature extraction is applied for scales 2 and 3 (Fig. 2(c)), as stated previously. To calculate the similarity between a probe and a gallery face, corresponding features at each scale are matched using the cosine rule, (10) where and refer to probe and gallery local facial features, respectively. When the two features are exactly equal, will be 1, corresponding to a perfect match. For each feature (row), the best match in the similarity matrix is included in one vector , which is constructed by concatenating all of the best matches found from all features. The score matching between a given probe face and a given gallery face is calculated as the mean value of the best matches in vector . Note that for each face, we have two types of features: scale 2 and scale 3 features. We can match probe and gallery faces based on the type (scale) of features separately, or fuse scale 2 and scale 3 at the score level using weighted sum fusion [3].
ELAIWAT et al.: 3-D FACE RECOGNITION USING CURVELET LOCAL FEATURES
175
TABLE I PERFORMANCE EVALUATION ON FRGC V 2 DATASET
III. EXPERIMENTAL R ESULTS We performed our experiments on the FRGC v2 dataset, which includes 3-D faces scans partitioned into a training and a validation set. 4007 3-D scans of 466 subjects were used for validation. 466 scans under neutral expression were taken to build a gallery, while the remaining scans (3541) representing probes were divided into neutral (1944 images) and non-neutral expression categories (1597 images). The dataset exhibits large variations in facial expression and illumination conditions but limited pose variations. More description about the FRGC v2 dataset can be found in [10]. Fig. 3 reports our identi fication and verification results using scale 2, scale 3 and both scales. In all cases, scale 2 performs slightly better than scale 3. This is because more distinctive features can be found in scale 2. Table I reports identification and veri fication rates when combining both scales 2 and 3, for neutral and non-neutral cases. For both cases, our algorithm achieves higher recognition rates than state-of-the-art approaches, including geometric features [11], [12], [13], local features [14] or optimized ICP [15]. The reduction of Curvelet coef ficients’ sensitivity to rotation through circular shift (Eq. (9)) of the feature vectors was also evaluated using all 1597 non-neutral faces of the FRGC v2 dataset. The resulting identi fication rates were 86.2% and 90.4%, without and with circular shift, respectively. In addition to the FRGC v2 dataset, additional experiments were conducted on the BU-3DFE dataset, which exhibits larger expression variations [16]. The dataset contains a total of 2500 facial expressions, distributed over 100 subjects. The dataset has six different facial expressions, with four levels of intensity for each subject [16]. The resulting identi fication rate at 0.1 FAR was 98.21% when combining scales 2 and 3. This is comparable to the most recently reported work by Lei et al. [17], who achieved 98.20% but had to rely on special masks to isolate forehead and nose regions. The computation cost of our keypoint detector and surface descriptor was evaluated on a standard desktop with Intel Core i7 3.4 GHz processor with 8.0 GB RAM. On average, for a standard depth image of the dataset, it takes to detect all keypoints, and to build all descriptors around each detected keypoint. Given that the implemented Curvelet transform is based on FFT, the proposed approach is well suited to either FPGA or GPU real-time implementations [18]. IV. CONCLUSION We presented a novel Curvelet-based feature extraction algorithm for 3-D face recognition. The algorithm fi rst identifies important keypoints on the face by examining Curvelet coef ficients in each subband. Such an identi fication is shown to be robust and repeatable because each keypoint is represented by
scale, orientation, Curvelet position, spatial position and magnitude. Rotation-invariant multi-scale local surface descriptors are then built around the detected keypoints to extract highly distinctive facial features for robust feature matching. Experiments performed on the FRGC v2 dataset have shown that the proposed algorithm is robust and accurate for varying facial ex pressions, with a superior veri fication rate of 97.83%. R EFERENCES [1] K. Bowyer, K. Chang, and P. Flynn, “A survey of approaches and challenges in 3d and multi-modal 3d + 2d face recognition,” Comput. Vis. Image Understand., vol. 101, no. 1, pp. 1–15, 2006. [2] W. Zhao, R. Chellappa, P. J. Phillips, and A. Rosenfeld, “Face recognition: A literature survey,” ACM Comput. Surv., vol. 35, no. 4, pp. 399–458, 2003. [3] A. Mian, M. Bennamoun, and R. Owens, “An ef ficient multimodal 2d-3d hybrid approach to automatic face recognition,” IEEE Trans. Patt. An al. M ach. Intell., vol. 29, no. 11, pp. 1927–1943, 2007. [4] J. Ma and G. Plonka, “A review of curvelets and recent applications,” IEEE Signal Process. Mag., 2009. [5] J. Fauqueur, N. Kingsbury, and R. Anderson, “Multiscale keypoint detection using the dual-tree complex wavelet transform,” in IEEE Int. Conf. Image Processing , 2006. [6] T. Mandal,Q. J. Wu, andY. Yuan, “Curvelet based face recognitionvia dimension reduction,” Signal Process., vol. 89, no. 12, pp. 2345–2353, 2009. [7] S. Rahman, S. Naim, A. Al Farooq, and M. Islam, “Curvelet texture based face recognition using principal component analysis,” in Int. Conf. Computer and Information Technology (ICCIT), 2010. [8] E. Cands, L. Demanet, D. Donoho, and L. Ying, “Fast discrete curvelet transforms,” J. Multiscale Model. Simul., vol. 5, no. 3, pp. 861–899, 2006. [9] I. Sumana, M. Islam, Z. Dengsheng, and L. Guojun, “Content based image retrieval using curvelet transform,” in IEEE Workshop on Multimedia Signal Processing , 2008. [10] P. Phillips, P. Flynn, T. Scruggs, K. Bowyer, J. Chang, K. Hoffman, J. Marques, M. Jaesik, and W. Worek, “Overview of the face recognition grand challenge,” in IEEE Comput. Soc. Conf. CVPR, 2005, vol. 1. [11] P. Liu, Y. Wang, D. Huang, Z. Zhang, and L. Chen, “Learning the spherical harmonic features for 3-d face recognition,” IEEE Trans. Image Process., vol. 22, no. 3, 2013. [12] D. Smeets, J. Keustermans, D. Vandermeulen, and P. Suetens, “Meshsift: Local surface features for 3d face recognition under expression variations and partial data,” Comput. Vis. Image Understand., vol. 117, no. 2, 2013. [13] Y. Ming and Q. Ruan, “Robust sparse bounding sphere for 3d face recognition,” Image Vis. Comput., vol. 30, no. 8, 2012. [14] Y. Lei, M. Bennamoun, M. Hayat, and Y. Guo, “An ef ficient 3d face recognition approach using local geometrical signatures,” Patt. Recognit., 2013. [15] H. Mohammadzade and D. Hatzinakos, “Iterative closest normal point for 3d face recognition,” IEEE Trans. Patt. Anal. Mach. Intell., vol. 35, no. 2, pp. 381–397, 2013. [16] L.Yin, X.Wei,Y. Sun, J.Wang,and M.Rosato,“A 3dfacialexpression database for facial behavior research,” in 7th Int.Conf. Automatic Face and Gesture Recognition, 2006 FGR 2006 , 2006, pp. 211–216. [17] Y. Lei, M. Bennamoun, and A. A. El-Sallam, “An ef ficient 3d face recognition approach based on the fusion of novel local low-level features,” Patt. Recognit., vol. 46, no. 1, pp. 24–37, 2013. [18] K. Moreland and E. Angel, “The fft on a gpu,” in Proc. the ACM SIGGRAPH/EUROGRAPHICS Conf. Graphics Hardware, 2003, pp. 112–119.