Target Variable

What is a Target Variable in Machine Learning?

The target variable of a dataset is the feature of a dataset about which you want to gain a deeper understanding. A supervised machine learning algorithm uses historical data to learn patterns and uncover relationships between other features of your dataset and the target.

The target variable will vary depending on the business goal and available data. For example, let’s say you want to use sentiment analysis to classify whether tweets about your company’s brand are positive or negative. Some aspects of a tweet that can be useful as features are word tokens, parts of speech, and emoticons. A model cannot learn how those features relate to sentiment without first being given examples of which tweets are positive or negative (the target). Targets are often manually labeled in a dataset, but there are ways to automate this process (see semi-supervised machine learning).

Why are Target Variables Important?

Without a labeled target, supervised machine learning algorithms would be unable to map available data to outcomes, just as a child would be incapable of figuring out that cats are called “cats” without having been told so at least a few times. It is important to have a well-defined target since the only thing an algorithm does is learn a function that maps relationships between input data and the target. The model’s outcomes will be meaningless if your target doesn’t make sense.

Target Variables + DataRobot

DataRobot makes it easy to select a target and start building supervised models. Once a user uploads a dataset and indicates which feature they want to understand, DataRobot does the rest of the data science heavy lifting.

Target

Once a user chooses a target and hits “Start,” DataRobot automatically uncovers insights that show you how features relate to the target and how much each trained model has learned about the target. This helps you easily pick the best model to deploy in your production application as well as spot issues that are notoriously difficult to discern, such as target (data) leakage.

Next-level predictive analytics with the best AI platform