AI-Powered Synthetic Data Generation

Collecting real-world data is expensive and time-consuming. Moreover, in most cases, real-world data cannot be used for testing or training because of privacy requirements, such as in healthcare in the financial industry. Other limitations include the availability and sensitivity of the data. To power deep learning and artificial intelligence algorithms, we need enormous data sets, and synthetic data generation can produce the same results as the real-world data in much less time and without compromising privacy.

AI-Powered Synthetic Data Generation

What is Synthetic Data?

AI-generated synthetic data replicates the components of real-world data without containing any identifiable information. Synthetic data is manufactured without measuring the information from real-world situations. It eliminates the need for vast volumes of real data, hence lowers the barrier to deploy data science by powering machine learning algorithms. Synthetic datasets can create and simulate various conditions, similar to authentic situations.

So why is it important for you?

Training deep neural networks (DNNs)

DNNs are irreplaceable when it comes to accurate image analysis and interpretation. They have an unparalleled capacity to differentiate between multiple object classes and are relatively simple to develop.

Synthetic data used in machine learning to yield better performance from neural networks. It eliminates the need for labeling and creating segmentation masks for each object, helps train stereo depth algorithms, 3D reconstruction, semantic segmentation, and classification.

Development and training of autonomous vehicles

Imagine having to drive 10 million miles to collect real-world data for training the algorithms of AI-based autonomous cars? Without simulations based on synthetic datasets, it would be very time-consuming, expensive, and even dangerous to train and test self-driving vehicles. Simulated roads are much more cost-effective and safe than the actual streets.

Fraud protection in Financial services

AI can generate data that simulates financial transactions from a mockup data based on real transactions extracted from a certain period of financial logs. Then the synthetic dataset adds malicious behavior to these transactions to evaluate the performance of fraud detection mechanisms.

Example of Synthetic Data Generation

Data Scientists at BroutonLab have created a program that generates enormous synthetic data sets to train stereo vision algorithms of convolutional neural networks.

By exploiting the powerful GPUs, the synthetic data generator tool has a 100% accuracy rate in rectifying stereo image pairs and can be used for training and testing stereo algorithms.

Problem

An accurate analysis of stereo depth is essential in many fields. Including drones, robotics, virtual reality, and online shopping.

For one of our projects, data scientists at BroutonLab needed to create a large aligned dataset from 3D photos of rings and corresponding depth maps.

Creating such a dataset manually is very expensive and time-consuming. To gather the necessary amount of data, we would need to take photos of more than 10,000 rings with different shapes and textures, on photographed from different angles and on different backgrounds. Then we would need to do the depth mapping manually. Another challenge is that the dataset needs to be diverse enough for the generalization of the neural network.

Solution

Data scientists at BroutonLab created a program to generate large datasets of images of rings. Rings consist of two parts: a base (the ring itself) and a stone fitted into the rings.

We created a set of ring models, textures, and materials that rings will be made of. Our data scientists wrote a script in Python using Blender API to generate a new ring automatically.

The script creates the environment (lighting conditions) and shoots the ring from different angles. Then it produces a corresponding depth map for each image.

Result

We created a tool to generate large aligned datasets. We can exponentially increase the size of the dataset because each parameter's values can vary within a specific range.

For example, if we have the following details of the rings:

number of bases = 10

number of stones = 5

amount of base materials = 5

number of stone materials = 5

number of shooting angles = 20

number of backgrounds = 10

The result is: 10 5 5 5 20 * 10 = 250k dataset size

With the help of an experienced designer, it is easy to create such a database from a sample and takes only a few days.