Would you like to reuse data for other studies, communicate on internal research, or just publish sensitive data? By becoming GDPR-compliant, our anonymisation platform AnonyMine, specialized in high-dimensional data such as clinical and microbiome, gives a second life to the data you generate.
The European General Data Protection Regulation (GDPR) governs the processing of personal data within the European Union.
Since the very principle of clinical research is to process participants’ personal data in order to respond to research hypotheses, compliance with this regulation is essential. The data controller must define the processing, assess the risks and impacts, set up appropriate security measures and, of course, inform individuals and guarantee their rights.
The data obtained during clinical research is valuable, and it must be possible to enhance, share and re-use them, to develop scientific knowledge and improve tools and processes.
However, this re-use also requires the processing to be compliant with the GDPR. These obligations can complicate, slow down, or even prevent the implementation of such research.
Therefore, only the data that are necessary and relevant for the trial objectives should be collected and processed, and should not be used for any other purposes than those specified in the consent form. Although these limits help to protect volunteers, they also present certain ethical, scientific, and economic limitations: collaborating on the data is complicated; the return on investment for research companies depends even more largely on the success of the clinical study; publishing in scientific journals often requires sharing the associated data, etc.
A solution is to anonymise data. Indeed, anonymised data are no longer considered personal data and are not subject to the obligations and restrictions of the GDPR. However, anonymising data are not a simple task. It requires applying appropriate techniques and methods that ensure a high level of irreversibility and prevent any re-identification of individuals from the anonymised data.
Why is pseudonymisation not enough?
Pseudonymisation is a technique that replaces some of the identifiers in personal data with pseudonyms, such as random numbers or codes. For example, a dataset could be pseudonymised by replacing the names and addresses of the customers with unique codes. The purpose of pseudonymisation is to reduce the risk of identifying individuals from the data, while still allowing some analysis or processing to be done.
However, pseudonymisation alone is not enough to anonymise data under GDPR. The reason is that individuals whose data have been pseudonymised can still be re-identified.
For example, if the pseudonymised dataset of clinical trial contains information on age, sex, height, etc, it may be possible to link this data with other sources of information, such as social media profiles, and re-identify some or all the volunteers.
Biofortis has turned its attention to these problems and now offers AnonyMine, an anonymisation solution specialised in large-scale data such as microbiome and clinical data.
HOW DOES ANONYMINE PLATFORM SOLVE THE PROBLEM?
Anonymisation is not a one-size-fits-all solution, but a complex and dynamic process that requires careful planning and evaluation.
Indeed, anonymisation has an impact on the quality of the data, which can affect its usefulness and reliability for various purposes. Thus, the goal of any anonymisation tool is to find a good balance between privacy and quality, meaning the ability to provide a synthetic dataset that still allows for truthful answers in downstream analysis without leaking sensitive information.
Our platform AnonyMine is specialised in high-dimensional data such as clinical and microbiome, and leverages AI simulation approaches to anonymise data while preserving key statistical and biological information.
• First, sensitive data are compressed using an in-house transformation method capable of extracting biologically-driven insights.
• Insights and relationships between biological and clinical features are learned from the whole dataset. Using the entire dataset limits the loss of variability and allows us to retain key statistical information found in normal biological data. At this stage, we leverage the same artificial intelligence methodologies as those renowned for their performance on image, text, and sound generation tasks.
• Once links are learned, AnonyMine uses randomness to generate a very large number of high-quality fictitious individuals. Our custom algorithm is then able to select the most suitable fictitious individuals to provide a high quality/confidentiality ratio.
At the end, AnonyMine outputs a high-quality dataset with the same number of features and individuals as in the initial dataset that satisfy target privacy metrics. The use of this new dataset is becoming GDPR-compliant.