World quality report

The power of synthetic data

Regulations regarding personal data are becoming stricter and that is a good thing. However, tightening up these rules poses a number of challenges. Data is essential for many companies to be able to respond to the needs of the market. A solution has been developed precisely for this challenge: synthetic data. Discover the power of synthetic data and the model developed in this blog.

Data security and usage have, in recent light, become an important consideration for not only companies but also individuals whose data is being protected. Every day, we produce around 2.5 quintillion bytes of data  (Forbes, 2018) be it in the form of social media posts, tweets, transactions, likes, web searches etc. All of this data is invaluable to companies as they use it to build and understand their customer profiles, look for trends, identify opportunities, tailor better services and products, and even anticipate events to capitalise on. However, this data can also be used to exploit, influence and abuse. This is why we need regulations, like GDPR, in place that govern and hold companies and individuals accountable for the way they use and gather the data. 

Right of access

GDPR evolved from a rule to become a regulation – the first of its kind in the European Union. Under this regulation, personal data or PII (personally identifiable information) is protected by restricting the processing and usage of the data. This regulation protects the end consumer and empowers them to be able to choose what happens with their data and understand how companies are using their data and for what – this is known as ‘right of access’. Under this regulation, individuals can choose whether or not companies can use their data for different purposes. Companies have to delete any data they might have from the individual if the individual decides to revoke their right of access.

High fines

Another feature of the GDPR focuses on the usage of the data and prohibits companies to use data other than for specific purposes that are inherent to their business models. The companies need to be able to state what data they collect and for what purpose. So, if a company is using production data for testing, this could amount to unlawful processing, especially if it was not explicitly stated what the data would be used for when getting the consent from the individual. There are of course ways to avoid incurring high fines and one of those methods is to use pseudonymised/masked data. The usage of pseudonymised data is more relaxed under GDPR and does not have the strict regulations to comply with however, there is still a risk of a data breach. Even better is the use of anonymised data, which is not regulated by GDPR, although this data comes with risks as well. Anonymised data is data that cannot be traced back to a certain individual, but recent studies have shown that anonymised data can still be traced back to identifying the underlying individuals, which makes this strategy still susceptible to adversarial attacks (Nature Communications, 2019) .

Synthetic data

This is where the power of synthetic data shines. Synthetic data looks and feels just like the real data holding all the characteristics and relationships present in the real data. Sogeti’s Testing^AI team has developed a new solution to create synthetic data with AI called ADA – Artificial Data Amplifier. ADA uses really advanced neural networks to generate synthetic data that can then be used in place of real data. ADA is not a generic data management tool; it is a custom solution that needs to be trained on real data. Typically, ADA extracts a dataset used in an application, environment or report. It then generates synthetic data and pushes it back into your databases. The advantages of using synthetic over real data are two-fold. First, the advantage of creating an entire dataset that looks and feels like your real data but without the security risk of any data breach is valuable for companies that operate in very highly regulated industries. Secondly, this solution is scalable meaning that we can create endless amounts of data based on a small sample of the real data. The advantage here is that we can create enough data for testing that is once again, GDPR compliant as it is purely synthetic.  To learn more about ADA or find out how you can implement synthetic data, contact the Testing-AI team!

A new realism

Finally, as we stated at the outset, there is a new realism across QA and testing globally. Having gone through the fire of COVID-19, test professionals have adapted to new ways of working that will be with us for a long time. Of course, there are still challenges — there always will be — but the momentum towards more quality engineering, cloud-based technologies, analytics AI and ML, amongst other trends, is encouraging. Further, with the next generation of digital transformation underway in industry — what we refer to as Intelligent Industry — the value brought by QA and testing in bringing smart products into play will be immense.


Declan Coates
Declan Coates
Digital & Data Portfolio Lead
+353 (0)1 639 0100
Sneha Sangoram
Sneha Sangoram
Cloud Solutions Specialist, Sogeti UK
Print Email