Here’s how machine learning can violate your privacy

Jordan Awan writes:

Machine learning has pushed the boundaries in several fields, including personalized medicine, self-driving cars and customized advertisements. Research has shown, however, that these systems memorize aspects of the data they were trained with in order to learn patterns, which raises concerns for privacy.

In statistics and machine learning, the goal is to learn from past data to make new predictions or inferences about future data. In order to achieve this goal, the statistician or machine learning expert selects a model to capture the suspected patterns in the data. A model applies a simplifying structure to the data, which makes it possible to learn patterns and make predictions.

Complex machine learning models have some inherent pros and cons. On the positive side, they can learn much more complex patterns and work with richer datasets for tasks such as image recognition and predicting how a specific person will respond to a treatment.

However, they also have the risk of overfitting to the data. This means that they make accurate predictions about the data they were trained with but start to learn additional aspects of the data that are not directly related to the task at hand. This leads to models that aren’t generalized, meaning they perform poorly on new data that is the same type but not exactly the same as the training data.