What Are Best Practices to Keep Private Information Safe While Building and Using AI Models?
First, it is critical for your data science development and operations team to know and recognize what kind of data may be defined as personally identifiable information (PII). PII is any data that can be used to backtrack to an individual’s identity, including something as simple as an email address. Generally, this is data that should not be used in the training of a machine learning model in the first place, which aims to learn relevant patterns from a dataset.
DataRobot has automatic PII detection available for on-premise installations of the platform. Various libraries also exist to help evaluate datasets for the presence of PII, which can be essential when pulling a very wide raw dataset with unknown variables from your data management system. When in doubt as to the presence of PII or whether a particular variable may qualify as such, consult your InfoSec team.
What Unique Risks to Privacy Do AI Systems Pose?
Data management best practices around the handling of sensitive information are not unique to AI, but AI does pose unique risks and challenges to privacy.
On the one hand, the quantity you are trying to predict might be sensitive. For example, consumer purchasing behavior alone can be leveraged to reveal sensitive information about the health, housing, employment, or marital status of your customers. This information might appear advantageous for ad targeting but risks a reputational backlash, even beyond any legal concerns. Consumers do not want to perceive that their privacy is being compromised by the data incidentally collected or sought by enterprises and platforms that they use.
Conversely, AI systems might be subject to adversarial attacks, designed to exploit the model to access information on your enterprise or customers. For example, a model inversion attack aims to use white-box information on a model to reconstruct the data used to train a model.
That said, best practices in information security still hold true. For example, an AI system built around anomaly detection can also assist in the identification of attacks on your server or network.