Pseudonymisation

Pseudonymisation is a technique that replaces direct identifiers in a data set with artificial pseudonyms, so that data cannot be linked to a specific individual without a separate re-identification key. GDPR cites pseudonymisation as an example of an appropriate security measure and a risk-reducing safeguard.

Back to Dictionary

What is pseudonymisation?

GDPR defines pseudonymisation in Article 4(5) as the processing of personal data in such a manner that it can no longer be attributed to a specific data subject without the use of additional information, provided that such additional information is kept separately.

In practice, pseudonymisation replaces direct identifiers (e.g. name and national identification number) with artificial keys (pseudonyms), whilst a separate re-identification key retains the link. The data set and the re-identification key must be stored separately and secured independently.

Pseudonymisation vs anonymisation

It is essential to understand the crucial difference:

Pseudonymisation: Reversible. Data can be re-identified using the re-identification key. Pseudonymised data is still personal data and remains subject to GDPR.
Anonymisation: Irreversible. Data cannot be traced back to an individual. Anonymised data is no longer personal data and falls outside the scope of GDPR.

Pseudonymised data is still personal data: A common misconception is that pseudonymisation removes GDPR obligations. It does not. GDPR applies in full to pseudonymised data, as it can be re-identified using the supplementary key.

Benefits in a GDPR context

Although pseudonymised data remains personal data, pseudonymisation provides tangible benefits:

Reduces the risk and impact of data breaches (still requires notification, but the risk to data subjects is lower).
Facilitates the use of data for analysis and testing without exposing direct identifiers.
Can support further processing for scientific or statistical purposes.
Is explicitly mentioned as an appropriate security measure in GDPR Article 32.

Pseudonymisation techniques

Common techniques include:

Tokenisation: Replacing identifiers with random tokens that have no mathematical relationship to the original data.
Hashing: Applying one-way functions that produce consistent pseudonyms, though vulnerable to dictionary attacks without salting.
Encryption-based pseudonymisation: Using encryption to transform identifiers, with the decryption key serving as the re-identification key.
Number substitution: Replacing identifiers with sequential or random numbers from a lookup table.

The choice of technique depends on security requirements and the need to maintain correlation across data sets.

Frequently Asked Questions about Pseudonymisation

What is pseudonymisation?

Pseudonymisation is the processing of personal data in a manner that makes it impossible to attribute the data to a specific individual without the use of additional information, which is kept separately. GDPR defines pseudonymisation in Article 4(5).

What is the difference between pseudonymisation and anonymisation?

Anonymisation is irreversible and the data can never be re-identified. Pseudonymised data is still personal data because it can be re-identified using the supplementary key. Anonymised data falls outside the scope of GDPR entirely.

Is pseudonymised data still personal data?

Yes. Pseudonymised data remains personal data under GDPR because the data can be re-identified using the separate re-identification key. All GDPR obligations continue to apply.

When should an organisation use pseudonymisation?

Pseudonymisation is valuable when you need to process personal data for analysis, testing or research whilst reducing the risk of exposing individuals. It is also explicitly mentioned in GDPR Article 32 as an appropriate security measure.

What are common pseudonymisation techniques?

Common techniques include tokenisation (replacing identifiers with random tokens), hashing (one-way functions producing consistent pseudonyms), encryption-based pseudonymisation and number substitution. The choice depends on security requirements and data correlation needs.