Data can be anonymous

anonymity

In principle, individual information from official statistics is strictly confidential. However, various special regulations of the Federal Statistics Act (BStatG) allow individual information to be passed on for data analysis purposes under certain conditions.

  • If the individual details cannot be assigned to the respondents or those affected, i.e. if no conclusions can be drawn from the data about the information providers or persons, these individual details can also be used outside of the official statistics (Section 16 (1) No. 4 BStatG). The data then meet the criterion of absolute anonymity.
  • For the purposes of scientific research, the Federal Statistics Act permits the provision of individual information which can only be assigned to the respondent or bearer of characteristics with a "disproportionate amount of time, cost and manpower". It is assumed that the data is actually anonymized. “Within specially secured areas of the Federal Statistical Office and the statistical offices of the federal states”, access to formally anonymous data may also be granted under certain conditions. In both cases, the data may only be made available to “institutions with the task of independent scientific research”. The persons receiving the data must also be public officials or persons with special obligations (Section 16 (6) BStatG).

The provision of anonymized data in the research data centers of the statistical offices of the federal and state governments is based on these regulations. The degrees of anonymity described are shown below.

Absolute anonymity

Completely anonymous data is changed to such an extent through coarsening and the removal of features that it is impossible to identify the respondent. Official statistics offer absolutely anonymized microdata in the form of Public Use Files (PUF). These can be made available to all interested persons. Absolutely anonymous campus files are also available for methodical teaching.

Factual anonymity

Microdata are referred to as de facto anonymous if de-anonymization cannot be completely ruled out, but the information can only be assigned to the respective feature bearer "with a disproportionately large amount of time, cost and manpower" (Section 16 Para. 6 BStatG). According to the Federal Statistics Law, de facto anonymous data may only be used by scientific institutions and only to carry out scientific projects.

When establishing factual anonymity, the aim is to almost exclude the likelihood of a correct assignment of information to respondents, while preserving the statistical information content as much as possible. Different anonymization methods can be used for this. Methods for information reduction (e.g. aggregation, class formation) or for information change (e.g. swapping method) are common. To determine the factual anonymity, the effort and benefit of de-anonymization must be assessed.

In the research data centers, however, the factual anonymity results not only from the remaining information content of the data, but also from the framework conditions for data use and the associated possibilities for de-anonymization. When a microdata set can be described as de facto anonymous also depends on the access conditions. It is of decisive importance which additional knowledge is available about the bearers of the characteristics and where the data is used. Depending on whether the microdata is used outside or inside the statistical offices, factual anonymity can be achieved with more (Scientific Use Files) or less (guest research workstations) significant loss of information.

Formal anonymity

The research data centers of official statistics offer the possibility of formally anonymous microdata to evaluate formally anonymous microdata in the context of controlled remote data processing and at the guest academic workstation, particularly for evaluations that are deeply structured in terms of subject or region. In order to establish formal anonymity, the direct identifiers and auxiliary features are removed from the data material, while the additional features and the technical and regional classifications are retained.