Real-time image recognition for an Android app

Developed a neural network model to detect explicit content in images. Works in real-time even on mobile devices without a GPU. Has multiple settings to change what is considered NSFW. Outperforms the NSFW detection model made by Yahoo.

Client

The client is a major Android app developer. They specialize in utility and file recovery tools for the Android operating system. They were working on an app that helps users filter and hide explicit content on their devices in real-time, in order to protect children using smartphones or to prevent NSFW content from appearing on your phone in public.

Problem

The client wants to develop a model to detect explicit visual content on the user’s screen in real-time while running completely on-device with a wide variety of Android smartphones.

The client experimented with subscription-based solutions like Google SafeSearch Detection, Azure Content Moderator, and Amazon Rekognition Content Moderation, as well as several open-source solutions. Open-source solutions couldn’t achieve good accuracy and paid services ended up too expensive in the long run.

They reached out to BroutonLab to develop a custom model for the Android platform to filter explicit content in real-time. The fast inference is a very important feature of the model, especially due to users owning Android phones with vastly different hardware specifications. Also, it was important to achieve high accuracy and allow users to change settings in order to determine what content is considered NSFW.

Example of the model’s output

Solution

The biggest problems of explicit content detection were:

  • The subjective nature of the task - what is considered explicit?
  • Lack of high-quality datasets
  • Inability to use large and powerful models due to hardware restrictions of Android smartphones

A few pretrained models were available, but they were lacking in accuracy, size, or performance.

Datasets

There were two open-source datasets, but both of them contained a lot of mislabeled data. After agreeing on what should be considered "explicit" with the client, a custom dataset was collected and annotated. The dataset was later expanded to further improve the accuracy of the model.

Latency vs Accuracy comparison on quantized models running on a Pixel 4 CPU (from Higher accuracy on vision models with EfficientNet-Lite)

Models

At first, the pretrained models were tested to measure their accuracy and performance. However, the results were not satisfactory. After that, a few small models, and a few large models were trained and tested on our custom dat aset. Large models were then quantized - the size and precision of floating-point parameters of the models were reduced to improve performance and decrease size at the slight cost of accuracy. Larger models were also pruned - model parts that have little effect on accuracy were removed.
The best model was selected and improved by tuning its hyperparameters and adding layers until a good balance of accuracy, model size, and performance was achieved. To further improve the accuracy of the model, the dataset was expanded.