Last year, we published an article in our blog explaining how facial recognition works with face masks. There we presented our facial recognition model results, which achieves 58% accuracy on masked face images.
Since then, we have modified our algorithm for generating synthetic images of faces in masks. Our model's results on the accuracy metric have significantly improved. We also prepared a dataset with authentic pictures of people wearing masks. It can be used as a benchmark for face recognition systems in a coronavirus pandemic era.
Why It Matters
Most people wear face masks during the COVID-19 pandemic. Especially in public places, where wearing a mask is a requirement. This condition challenges many applications that rely on facial recognition technologies. User identification and authentication applications are becoming unreliable and less secure. It is leading to problems with unauthorized access and data privacy violations:
- payment in the banks with the user identification by his face
- monitoring of passengers at airports, train stations, subways
- access to devices by owner's face
Let's look at the main reasons why facial recognition technologies don't work well with masked faces.
Facial Features Under Masks
In order to identify the person on the screen, facial recognition technologies use the facial feature points of the person's face: eyebrows, eyes, nose bridge, lips, and so on. Unfortunately, the face mask makes many features unable to use.
They hide most of the features relating to the chin and nose. For example, masks cover all signs of the lips completely. Since today's facial recognition technologies rely heavily on the quality and number of facial feature points, this will undoubtedly affect their performance.
Facial features of the upper half of the face, such as eyes and eyebrows, might be enough to recognize a person wearing a mask - if images in use are of high-quality. However, high-quality images are not available in most cases, especially for cameras in public places, airports, offices, and so on. Pictures taken by public cameras differ in the angle of shooting, lighting, distance to the camera, noise, compression.
The core thing about today's facial recognition technologies is the use of deep learning, which requires a large amount of training data. At present, there are no publicly available datasets of masked face images. This also limits the performance of facial recognition. One possible answer to this problem is the generation of synthetic data.
We tried two approaches to generate a synthetic dataset of incompletely exposed faces. The first is replacing the lower half of the face with random noise, as in the example below.
The second approach involves detecting facial feature points, triangulation of mask templates, and matching points of the face and mask, as we did in the past article.
The problem with the second approach was that we extracted face mask templates directly from images. As it is not easy to detect facial feature points under the mask, these mask templates contained a lot of noise and often had rough corners. As a result, the synthetic data obtained from these masks visually did not look very good.
In order to solve this problem, we manually marked up the critical points on the mask templates, which significantly improved the quality of the synthetic data.
For the purpose of training and evaluating a deep learning model for face recognition, we have generated two synthetic datasets of masked faces. The first one is based on a large-scale face dataset, VGGFace2 (9131 persons with 300 images each), on which we have applied masks and used them for training. To evaluate the face recognition model's performance, we have generated a masked version of the Labelled Faces in the Wild (LFW) dataset. LFW is the standard dataset for performance evaluation in the industry.
Real-World Masked Faces Dataset
In addition to the datasets' synthetic masked version, we also created and preprocessed a dataset of real-world masked faces. It consists of images of athletes, celebrities, politicians. At the moment, it includes 100 identities and 450 photos in total. For each identity, there are images of the face with and without a mask.
Creating such a dataset aims to answer the question: will a model trained on fully synthetic data work well on real-world data? This dataset is available to everyone, and you can download it here.
To evaluate our face recognition model's performance, we use the accuracy metric as described in the paper "Facenet: A unified embedding for face recognition and clustering." The face recognition model, receiving a pair of face images, must determine whether they belong to the same person or not.
After 15 training epochs, on the masked version of the LFW dataset, our model achieves an accuracy of 97.7667%. On the real-world masked dataset, it achieves 95.5039%.
As you can see, our fully synthetic dataset is usable for training an effective face recognition model. And we can improve this result by utilizing higher resolution images. Since we used small resolution images of faces and mask covers most of it, the neural network is weaker at extracting features with a good generalizing ability.