Data Masking

Data masking replaces sensitive data with fictitious but realistic values. The technique makes it possible to use data for testing, development and analysis without exposing real personal data or business-critical information.

Back to Dictionary

What is data masking?

Data masking is a technique in which sensitive data is replaced with fictitious but structurally correct values. A national identification number, for example, can be replaced with a random number in the same format, and a name can be replaced with a generated name. The masked data looks like real data but cannot be traced back to the original individual.

Masking differs from encryption, which transforms data into unreadable text that can be decrypted. It also differs from pseudonymisation, where the link to the real identity can be restored. Data masking is typically a one-way process.

In a world where personal data is used in test environments, development projects and analyses, masking is a practical way to reduce the risk of data leaks. It is also an important part of data classification, which helps determine which data requires masking.

Masking methods

Several techniques exist for data masking, and the right method depends on the data type and purpose:

Substitution: Replaces values with random but realistic alternatives from a reference set. A real name is replaced with another name.
Shuffling: Rearranges values within a column so the association between rows is broken. The salary column is randomised, but the actual salary values are preserved in the dataset.
Nulling: Replaces values with empty fields or null. Simple but limits the realism of data testing.
Variance-based masking: Alters numerical values by a random offset. An age of 34 might become 31 or 38.
Tokenisation: Replaces sensitive data with a token that refers to the original value in a secure vault. Often used for payment-card data.

Static masking is applied to databases that are copied to test environments. Dynamic masking occurs in real time when users access data, showing different levels of data based on the user’s role and access rights.

Practical applications

Data masking is relevant in several scenarios:

Test and development: Developers need realistic data to test applications. Masking makes it possible to use production-like datasets without exposing real personal data. It supports secure development and application security.

Analysis and reporting: Data analysts can work with masked data to identify patterns and trends without access to sensitive information.

Outsourcing: When third parties need access to data, masking reduces the risk. Combined with DLP, it ensures that sensitive data does not leave the organisation in clear text.

Training: New employees can train on systems with masked data, building security awareness from day one.

Regardless of the scenario, masking should be combined with logging and monitoring to ensure that masking policies are adhered to.

Regulations and standards

GDPR mentions data masking as a possible technical measure. Article 25 on data protection by design and by default encourages minimising the use of personal data, and masking is a direct way to achieve this.

ISO 27001 and Annex A include controls for protection of test data (A.8.33), which specifically mention masking as a technique. An ISMS should define when masking is required.

DORA and NIS2 impose requirements on protecting data in ICT systems, and masking is a recognised method for meeting these requirements in test and development environments.

Frequently Asked Questions about Data Masking

What is the difference between data masking and encryption?

Encryption transforms data into an unreadable form that can be decrypted with the correct key. Data masking permanently replaces data with fictitious values that cannot be reversed. Encryption is used for data in transit and at rest, while masking is used for test environments and analyses.

What is the difference between data masking and pseudonymisation?

Pseudonymisation replaces identifying data with a pseudonym, but makes it possible to restore the link via a separate key. Data masking is typically irreversible. Under GDPR, pseudonymised data is still personal data, whereas correctly masked data is not.

When should data masking be used?

Data masking is relevant when you need realistic data for testing, development, training or analysis but are not permitted to use real personal data. It is also useful for outsourcing or sharing data with third parties.