Huge amounts of personal data are increasingly collected by governments and the private sector. Such data are potentially highly valuable for scientists, e.g. for work on precision medicine and digital health. Striking a balance between free availability of data for research purposes and the protection of individuals from potentially harmful disclosure and misuse of information, however, is not an easy task. Efforts to guarantee effective de-identification methods have been so far inconclusive, particularly in the context of large datasets where it is extremely difficult to prevent re-identification of individuals. Synthetic data can capture many of the complexities of the original datasets, such as distributions, non-linear relationships, and noise. Yet, synthetic datasets do not actually include any personal data. We may provide solutions for well understood domains, augment domain data when acquiring such data is sensitive or expensive, and explore machine learning algorithms and solutions when actual domain data is not available. A number of opportunities and challenges follow as a result in the fields of artificial intelligence, e.g. machine learning applications, and personal data processing for scientific purposes, e.g. the re-use of personal data.
• How do synthetic data improve today’s state-of-the-art in AI?
• How can synthetic data improve today’s legal regulations on the processing of personal data for scientific purposes?
• What are the limits, e.g. translational or operative boundaries, of this approach?
• What personal data applications could be a game-changer through the use of synthetic data?