Video-Based Face Recognition for a Sports App

computer vision


A US-based mobile application developer. The Application offers fans to receive their video appearances from live events. They partner with TVs and video boards to deliver users their videos.


The number of videos that the client had to process was increasing. Fragments with fans are approximately 30 minutes of the total game video. It took an enormous amount of manual work to process 3-hour videos, select shots with fans, and identify the fans and send them their videos.

Long turnaround time caused negative reviews and extended waiting time for users. The client had to recruit additional personnel to process videos and customer complaints.

The open-source solutions for face recognition in video did not deliver expected results. So they approached Data Scientists at BroutonLab to develop a model for detecting gender, appearance quality, and fan shots in videos.


We developed a neural network and augmented it with NLP to process the input text and classify every word (a NER system).

The training dataset was small, so we used unsupervised learning (e.g. Word2Vec) to achieve accurate results.

We created an automatic pipeline for data collection, preprocessing, and training.

We developed a gender classification system that recognizes a person's gender from his/her face on the image. So we used a pre-trained backbone with a classification head on ImageNet.

We trained a separate quality model to filter out low-quality face images provided by the face extractor.

We developed a 3D convolutional neural network for automated shot detection. It analyzes the sequence of frames and detects transitions between shots in the video for temporal segmentation. This model is deployed as microservice and delivers high accuracy results in production.

We trained a separate quality model to filter out low-quality face images provided by the face extractor.

Using the CNN+LSTM model that we trained, we developed automatic video captioning. CNN extracts features of the image, and LSTM learns a specific language model.

We trained the entire pipeline end-to-end.

Get a Free Consultation Now