Indirect personal data findings cause surprises in GDPR analyses



When organisations review the level of General Data Protection Regulation (GDPR) compliance of their information, the frequent findings of indirect personal data often come as a surprise. It is expected that personal data will be found in obvious places such as the HR and RRM systems. But, when data are actually found in dozens or even hundreds of systems as a total of hundreds of thousands, millions or even hundreds of millions of findings, the situation can seem overwhelming.

Preparation for the GDPR becomes much easier when the mass of personal data is reviewed with the purpose of identifying data for which the GDPR is genuinely relevant. With automation, even large volumes of data can be reviewed. When the location and content of the relevant set of personal data has been identified, an appropriate and efficient management model and risk management method can be applied, followed by initial corrective action.

Indirect personal data differs from direct personal data

Direct personal data serves to uniquely identify a person. This includes data such as names, identity numbers, telephone numbers, email addresses, and in many cases even a person’s postal address or bank account number.
Indirect personal data, on the other hand, is created in certain circumstances. It is created when personal data are encoded in a system or a system architecture using a key, such as a set of numbers like 12345. The set of numbers alone has no meaning, but when it is used in a registry to identify a person, the key becomes personal data in all contexts.
There are two types of key references to personal data. The first is registry entries that are derived from user management and are automatically created so as to ensure traceability when the information system is used. Such fields, for example CreatedBy and ERNAM, are found in well-designed systems everywhere.
The second type is personal foreign keys that comprise references to personal master data between parts of a system. They are related to the purpose of the application, such as a reference to the contact person for a sales order, or a product owner in product information. These foreign keys are typically created as a result of user activities.

The hundreds of millions of false findings of indirect personal data must be processed

False findings are typical of indirect personal data. In many organisations, the foreign key sets are poorly designed, and the same sets of numbers may be used as the codes for persons, materials, customer numbers or other types of information. Problems will occur when sets of numbers intersect and there is no way of knowing whether they refer to a person or to some other type of information.
“We have seen that, when data are reviewed, up to 90% of findings may be false. The number of false findings may rise to hundreds of millions. According to the GDPR, coded information is considered to be personal data unless it can clearly be demonstrated that it is not; a reversed burden of proof is applied.”

Problems may arise, for example, when someone exercises their right to request all their personal data from the company. If the company is not able to demonstrate that a data item does not constitute personal data, reference keys that overlap for instance with product codes may lead to situations in which the company will have to deliver parts of its product registry or price list.

Misuse of data fields produces random findings

The situation may be further complicated if, for example, the name or description fields in a data system are not used correctly, or if personal data is included in comment fields. If an order refers to individuals using their name or telephone number, the table instantly becomes a personal data file. Such findings may come up in dozens or hundreds of registries.
Random findings must be corrected (in most cases, removed), and systems and operating models must be fixed. The management must ensure that users understand the correct use of systems, and provide systems that work in compliance with the regulations.
“If the user’s work process includes collection of personal data, and the system doesn’t have the appropriate fields for this, it is understandable that the user enters the data in inappropriate fields.”

Even problematic cases can usually be corrected

It has been interesting to see how the number of personal data, which are now genuine findings, drop to a fraction of the original number after the completion of the cleansing procedure. After identification, the critical definition of data required for business operations, and the removal of accumulated or outdated data, constitute measures to minimise the personal data as required by GDPR.
A large part of the corrective action can often be performed in advance as background operations, but cooperation with operating service organisations and suppliers is usually necessary. Some issues cannot be solved without making changes to the systems, or replacing the systems altogether. It all begins with being able to identify the data that is actually subject to the GDPR.