Report to DHS details significant facial recognition improvement from 3D Pose Correction

In 2010 CyberExtruder began a project to determine whether it would be possible to create a system that could locate human faces, track their facial features, create 3D versions of those people and use the 3D to mitigate the detrimental effects of a person’s pose relative to the camera. The goal was to provide a quantifiable measure of the effect of the resulting pose correction that is obtained by the 3D head reconstruction on facial recognition.

Three ranked identification tests were performed. All three tests utilized the exact same gallery and probe sets of images. The gallery contained 109 different individuals 99 of which were close to an exact frontal pose and 10 of which had a head rotation of up to 5 degrees. The probe set contained approximately 10,000 images of those individuals covering a head pose range of +/- 70 degrees yaw (head side-to-side) and +/-25 degrees head pitch (head up/down) in steps of 5 degrees. The posed probe sets also contain images with a wide variety of environments and head scale; e.g. some heads are from people far down a corridor whilst some heads are close to the camera. This results in a difference in the number of pixels representing the heads in the probe images.

2D face recognition baseline

The first test provided a baseline comparison to a commercial facial recognition (CFR) engine. This test was applied to the 2D images in the gallery and the probe set. The remaining two tests utilized pose correction. Pose correction was obtained by generating the 3D head mesh using the Aureus 3D engine and then rendering that head back to 2D with zero head pose. The CFR engine was then used to generate a facial template for all probe and gallery images. The test was performed by stepping through each probe image in a pose cell and obtaining matches from its template to every template file in the gallery set. The resulting matches were ranked (sorted) from best to worst matched. Thus ranked match lists were obtained for all the probe images in that pose cell and the cumulative percentage of correctly matched images per-rank was obtained for that cell. The whole process was then repeated for each pose cell.
2D face recognition baseline
Table 1 -- Results of the CFR ranked identification test for head pose. The horizontal axis represents 70 degrees to zero degrees head yaw and the vertical represents +/- 25 degrees head pitch. The results are presented for ranks zero, one and two respectively. The bottom row in each ranking provides the average rankings over the whole pitch range. All rank values are presented as percentages. The rows highlighted in blue display the resulting rankings at zero pitch and the rows highlighted in orange display the rankings averaged over all pitches.

3D face recognition baseline

The second test was designed to investigate whether the Aureus 3D model parameters were capable of identifying people from the same gallery and probe sets of posed images. The Aureus 3D gallery model contains various parametric vectors ultimately culminating in a single combined parametric vector.

Table 2 shows that the method of identification using mahalanobis distance for combined parametric vectors provides reasonable rank zero identification. The identification results averaged over pitch ranges from 47% to 90% over a yaw range of 40 degrees. This is comparable to the CFR system ranging from 46% to 80% over a yaw range of 30 degrees. Using only texture parametric vectors a rank zero performance of between 47% and 91% can be achieved over a yaw range of 40 degrees. Comparing the CFR and texture mahalanobis distance based identification tests over a 40 degree yaw range yields a rank zero identification range of 19-80% for the CFR and 47-91% using mahalanobis distance. This suggests that the 3D reconstruction has a beneficial effect on facial recognition at pose.

Table 2 -- Results of the mahalanobis ranked identification test for head pose. The horizontal axis represents 70 degrees to zero degrees head yaw. The vertical axis represents 25 to -25 degrees pitch. Ranks zero, one and two are displayed for the yaw range with results averaged over head pitch (orange rows). Zero pitch is highlighted as blue horizontal lines. In the figure above “COMB” presents the identification results using the combined Aureus 3D parameters and “Tex” presents the results using only the texture parameters.
 
Table 2 also clearly shows that the method of identification using mahalanobis distance for combined parametric vectors provides reasonable rank zero identification. The identification results averaged over pitch ranges from 47% to 90% over a yaw range of 40 degrees. This is comparable to the CFR system ranging from 46% to 80% over a yaw range of 30 degrees. Using only texture parametric vectors a rank zero performance of between 47% and 91% can be achieved over a yaw range of 40 degrees. Comparing the CFR and texture Mahalanobis distance based identification tests over a 40 degree yaw range yields a rank zero identification 38 range of 19-80% for the CFR and 47-91% using mahalanobis distance. This suggests that the 3D reconstruction has a beneficial effect on facial recognition at pose.

An additional observation can be obtained from Table 2; the texture parameters appear to have a slight advantage over the combined parameters. It is hypothesized that this is due to the texture parameters containing little or no expression variation since they do not represent facial shape. This should lead to identification improvement however at a loss of identification via facial shape, i.e. people have different face shapes. It is proposed that an improved FR performance could be expected by correcting for facial expression. In such a circumstance it is reasonable to expect that the combined parameters would lead to superior identifications than the texture parameters alone.

2D face recognition with 3D pose correction results

The third test applied pose correction then used the CFR SDK to generate a template file for each pose corrected image for both the gallery and the probe sets of images. The test was then repeated using these pose corrected template files. Thus the results of the third test could be used to determine any improvement obtained from pose correction using the CFR engine.

Table 3 – Comparative identification results using the CFR on the 2D images (CFR 2D) and the pose corrected (CFR 3D) images. The horizontal axis represents 70 degrees to zero degrees head yaw. The 39 vertical axis represents 25 to -25 degrees head pitch. Results for zero head pitch are outlined in red and results averaged over all head pitches are outlined in orange. The CFR2D results are highlighted in white, the CFR 3D results are highlighted in blue and the percentage improvement from pose correction (CFR3D minus CFR2D) results are highlighted in green. The yellow colored cells at the end of each green row present the average improvement over all head yaws for a head particular pitch. The bottom right yellow cell displays the resulting improvement over all head pose. All values are presented for rank zero only.

Summary

It can be seen from Table 3 that using the CFR system with head pose corrected images significantly improves facial recognition matching performance. In the same 30 degrees yaw range the pitch averaged results improve from a range of 46-80% to 72-98%. For a comparable lower end rank of 54% the yaw range is extended to 50 degrees. The bottom right yellow cell in Table 3 displays an average improvement over all poses of 27% for rank zero using head pose correction.