Data Masking

Data masking replaces sensitive data with fictitious but realistic values. The technique makes it possible to use data for testing, development and analysis without exposing real personal data or business-critical information.

Back to Dictionary

Table of Contents

    What is data masking?

    Data masking is a technique in which sensitive data is replaced with fictitious but structurally correct values. A national identification number, for example, can be replaced with a random number in the same format, and a name can be replaced with a generated name. The masked data looks like real data but cannot be traced back to the original individual.

    Masking differs from encryption, which transforms data into unreadable text that can be decrypted. It also differs from pseudonymisation, where the link to the real identity can be restored. Data masking is typically a one-way process.

    In a world where personal data is used in test environments, development projects and analyses, masking is a practical way to reduce the risk of data leaks. It is also an important part of data classification, which helps determine which data requires masking.

    Masking methods

    Several techniques exist for data masking, and the right method depends on the data type and purpose:

    • Substitution: Replaces values with random but realistic alternatives from a reference set. A real name is replaced with another name.
    • Shuffling: Rearranges values within a column so the association between rows is broken. The salary column is randomised, but the actual salary values are preserved in the dataset.
    • Nulling: Replaces values with empty fields or null. Simple but limits the realism of data testing.
    • Variance-based masking: Alters numerical values by a random offset. An age of 34 might become 31 or 38.
    • Tokenisation: Replaces sensitive data with a token that refers to the original value in a secure vault. Often used for payment-card data.

    Static masking is applied to databases that are copied to test environments. Dynamic masking occurs in real time when users access data, showing different levels of data based on the user’s role and access rights.

    Practical applications

    Data masking is relevant in several scenarios:

    Test and development: Developers need realistic data to test applications. Masking makes it possible to use production-like datasets without exposing real personal data. It supports secure development and application security.

    Analysis and reporting: Data analysts can work with masked data to identify patterns and trends without access to sensitive information.

    Outsourcing: When third parties need access to data, masking reduces the risk. Combined with DLP, it ensures that sensitive data does not leave the organisation in clear text.

    Training: New employees can train on systems with masked data, building security awareness from day one.

    Regardless of the scenario, masking should be combined with logging and monitoring to ensure that masking policies are adhered to.

    Regulations and standards

    GDPR mentions data masking as a possible technical measure. Article 25 on data protection by design and by default encourages minimising the use of personal data, and masking is a direct way to achieve this.

    ISO 27001 and Annex A include controls for protection of test data (A.8.33), which specifically mention masking as a technique. An ISMS should define when masking is required.

    DORA and NIS2 impose requirements on protecting data in ICT systems, and masking is a recognised method for meeting these requirements in test and development environments.

    Frequently Asked Questions about Data Masking

    What is the difference between data masking and encryption?

    Encryption transforms data into an unreadable form that can be decrypted with the correct key. Data masking permanently replaces data with fictitious values that cannot be reversed. Encryption is used for data in transit and at rest, while masking is used for test environments and analyses.

    What is the difference between data masking and pseudonymisation?

    Pseudonymisation replaces identifying data with a pseudonym, but makes it possible to restore the link via a separate key. Data masking is typically irreversible. Under GDPR, pseudonymised data is still personal data, whereas correctly masked data is not.

    When should data masking be used?

    Data masking is relevant when you need realistic data for testing, development, training or analysis but are not permitted to use real personal data. It is also useful for outsourcing or sharing data with third parties.

    +400 companies use .legal
    Region Sjælland
    Aarhus Universitet
    aj_vaccines_logo
    Realdania
    Right People
    IO Gates
    PLO
    Finans Danmark
    geia-food
    Vestforbrænding
    Evida
    Klasselotteriet
    NRGI1
    BLUE WATER SHIPPING
    Karnov
    Ingvard Christensen
    VP Securities
    AH Industries
    Lægeforeningen
    InMobile
    AK Nygart
    ARP Hansen
    DEIF
    DMJX
    Axel logo
    qUINT Logo
    KAUFMANN (1)
    SMILfonden-logo
    kurhotel_skodsborg
    nemlig.com
    Molecule Consultancy
    Novicell