Restricted access or confidential datasets contain information that poses risks to human subjects. This risk comes in the form of potential disclosure of personal information. Producers are generally concerned about respondent confidentiality with regard to either:

  • direct identifiers: e.g. name or social security number

  • indirect identifiers, items that could be used in conjunction with publicly-available information to identify individual respondents: e.g. detailed geographic information, occupation, or education, etc.

  • information that is sensitive in nature: e.g. personal medical or financial records, data about vulnerable populations such as children or prisoners

Data providers remove or mask direct and key indirect identifiers or provide only aggregated data in the public-use versions of data files. However, to facilitate research some producers and providers may permit researchers to use restricted data files under controlled conditions.

In the past restricted access datasets used in social science research were largely limited to these more detailed versions of standard data sources such as PSID, Add Health, etc. Increasingly researchers are also incorporating personal information drawn from the administrative records gathered by agencies in course of their work.