AI and personal data: advice from the European regulator EDPB
The hype surrounding AI continues unabated. Every developer/owner/investor (underline as appropriate) wants to integrate AI into their product or launch a new one that is “enhanced” by artificial intelligence.
One of the first things to figure out is whether your AI model is subject to GDPR requirements, as this will determine how deeply you will have to delve into data protection issues. When we say “AI model,” we are referring, for example, to well-known services such as ChatGPT, Gemini, Claude, or Perplexity, which are used to generate text and process queries. But it’s not just about them. Your product, enhanced by artificial intelligence, may also raise questions under the GDPR.
The GDPR does not explicitly mention AI, but it does mention data protection. Therefore, the interaction of AI with personal data required the creation of a clear framework and interpretations for both users and developers. In December 2024, the European regulator (the European Data Protection Board, better known as the EDPB) published Opinion on certain data protection aspects related to the processing of personal data in the context of AI models (Opinion).
The opinion answers specific questions from the Irish Commission regarding the processing of personal data in the context of the development and implementation of AI models:
(1) when and how an AI model can be considered “anonymous”;
(2) whether legitimate interest can be used as a legal basis for developing or using AI models;
(3) what happens if the GDPR is violated during the development and operation of AI models.
In this article, we will examine the key answers to these questions.
The nature of AI models in the context of personal data definition. When can AI models be considered anonymous?
First, it is necessary to determine whether an AI model falls under the GDPR regulation. If this is done incorrectly, penalties may be imposed for improper processing of personal data, violation of processing principles, failure to report data leaks, etc.
The GDPR applies to:
- European companies when processing personal data;
- companies that process personal data/monitor the behavior of people from the EU.
If there is no personal data (the data is anonymous), the GDPR does not apply. Therefore, the EDPB begins its analysis with the issue of anonymity.
The Opinion distinguishes between AI models:
- those that, by their nature, will process personal data. For example, these are AI models that are specifically designed to generate or imitate personal data used during model training (e.g., a human voice);
- and those for which the situation is not so clear-cut: these are models that are trained using personal data but will not necessarily process or generate personal data after training. Until now, there has been uncertainty because it has been difficult to determine whether these models can be considered to process personal data in the course of their further use.
The EDPB emphasizes that although AI models generally do not contain data that can be directly linked to a specific person, certain information can still be obtained from them.
Furthermore, if personal data was used during training, such models cannot always be considered anonymous. This means that even if the data appears to be hidden or unrelated to a specific person, AI can still recover or recall some information about these people. Information about a person may remain in the model parameters. If such data can be extracted or accidentally obtained from the AI model by a controller or another person, such a model cannot be considered anonymous.
It’s easy to say, but harder to figure out in practice. That’s why the EDPB gives some tips to local European data protection authorities (DPAs) on how to tell anonymous models from non-anonymous ones.
Factors for assessing the anonymity of AI models
The DPA must assess whether an AI model is anonymous in each specific case. For a model to be anonymous, it must be unlikely that it is possible to:
- directly or indirectly identify individuals whose data was used to train and develop the model;
- “extract” such personal data from the model using various queries.
The Council provides a non-exhaustive list of elements that supervisory authorities may consider when assessing the anonymity of an AI model:
- Model design. Supervisory authorities should assess how the AI model was developed. This includes:
- Choice of data sources: Check whether developers chose the most relevant sources for training the model and collected a minimum of personal data.
- Data preparation: An assessment is made of whether anonymized or pseudonymized data was used and whether irrelevant data was filtered out before training.
- Training methodology: It is important that a methodology is used that minimizes the identification of individuals and that technologies are applied to ensure confidentiality.
- Protection mechanisms: It is checked whether there are mechanisms that reduce the risk of personal data being “extracted” through queries to the model.
- Ability to ensure anonymity. It is checked whether audits have been conducted to assess measures that reduce the risk of identification.
- AI model testing and attack resistance. It is assessed whether the model has been tested for vulnerability to attacks that could extract personal data.
- Documentation. Controllers must carefully document all stages of data processing and measures to ensure that AI models have not been trained on personal data.
Such documents may include:
- a data protection impact assessment (Article 35 of the GDPR) or a justification for its unnecessary nature.
- information on measures taken to reduce the risk of identification at all stages of the model’s life cycle.
- confirmation that the model is resistant to re-identification and a description of measures to protect against attacks.
Is legitimate interest a valid legal basis for processing personal data in AI models?
In some cases, legitimate interest may be a legal basis for processing personal data when developing and using AI models. In particular, the EDPB refers to its previous guidance, which contains a three-step test for assessing this approach:
- identify legitimate, articulated, and real interests;
- prove that the processing was necessary (while minimizing the use of personal data); and
- balance the rights of data subjects.
As an example, a controller, the owner of an AI model, can rely on its legitimate interest if it develops a chatbot service to assist users or creates an AI model to detect fraud or dangerous content, as well as to improve the security of information systems.
How to calculate the impact of processing on data subjects? In particular, risks may depend on the nature of the data (e.g., financial or geolocation data may pose serious reputational or discriminatory threats), the number of individuals whose data is being processed, and the nature of the relationship between the controller and the data subjects.
Companies must clearly explain to individuals how their data will be used in order to meet the reasonable expectations of data subjects regarding data processing when using AI models and thus ensure transparency in accordance with the requirements of the GDPR.
Controllers should also implement measures to mitigate potential risks, including:
- technical measures, such as pseudonymization or masking of personal data in training sets;
- additional transparency measures, such as publishing information about data collection criteria or explaining how the model works through information campaigns or graphic materials.
Such measures will help the controller demonstrate that it processes data in a reasonable manner and takes the interests of data subjects into account.
What happens if personal data is processed illegally during the development of an AI model?
Supervisory authorities can decide independently what measures to take depending on the situation. These may include:
- a requirement to correct violations in data processing,
- a fine,
- temporary restrictions on processing,
- deletion of part or all of the data set, or even the entire AI model.
However, given the practices of particularly active regulatory authorities, the model owner will certainly not get away with just one thing. And once you are on the hook, there is a risk of coming under constant scrutiny. And then, who knows, maybe they will find a couple more violations in your activities, just as a bonus.
The Opinion considers three scenarios of violation where personal data is unlawfully processed for the development of an AI model and:
(1) is stored in the AI model and subsequently processed by the same controller;
(2) stored in the AI model and processed by another controller in the context of model development;
(3) if the controller ensures the anonymity of the AI model before further processing of personal data in it.
Therefore, unlawful processing of data at the initial stage of AI model development may have different consequences depending on how the data is processed afterwards (whether it is stored, anonymized, or transferred to another controller). In each case, the supervisory authority must assess whether there is a proper legal basis for further processing of the data and take into account the context of each specific case to determine possible violations.
An important conclusion of the EDPB mentioned in all these scenarios is that even if the initial processing of data was unlawful, this will not necessarily affect the lawfulness of subsequent operations. Further processing may be lawful if the data has been modified (e.g., anonymized) or if other legal grounds for processing meet the requirements of the GDPR.
If the data has been anonymized, further processing is no longer subject to the GDPR, as anonymous data is not personal data.
Recommendatory and coercive nature
The opinion is of a recommendatory nature for competent supervisory authorities throughout the European Economic Area, but the Board may adopt a binding decision if an authority does not take the opinion into account. Therefore, in essence, supervisory authorities must immediately take this opinion into account in their practice.
What we advise AI model developers
- Determine whether your AI model falls under the GDPR at the development stage.
- Ensure the anonymity of the AI model: minimize personal data by using anonymized or pseudonymized data, test the model for resistance to attacks, and protect the data.
- Justify the legal basis for data processing: indicate the legitimacy and reality of the interest, the necessity of processing, compliance with the rights of data subjects, and conduct a risk assessment to minimize them.
- Document your actions: record all steps and measures related to data processing that you carry out within your AI model (for example, if you are assessing legitimate interest, do so in writing).
- Prevent illegal data processing at the development stage, as this may result in a fine or even the subsequent removal of the model. Further processing may be lawful provided that further anonymization or an appropriate legal basis is available.
Other post
Export of Defense Technologies – Key Compliance Rules for International Cooperation in MilTech
May 7, 2026 1 min
What is the world of internal policies and assessments under the GDPR like?
October 10, 2025 1 min
Two Galyas, one market: what is happening between the Baluvana Galya and Galya Baluvana chains
July 18, 2025 1 min