Feature Selection

What is Feature Selection in Machine Learning?

Adding features to your dataset can improve the accuracy of your machine learning model, especially when the model is too simple to fit the existing data properly. However, it’s important to focus on features that are relevant to the problem you’re trying to solve and avoid those that contribute nothing. For example, if you’re trying to predict flight delays, today’s temperature may be important, but the temperature three months ago will not.

Good feature selection eliminates irrelevant or redundant columns from your dataset without sacrificing accuracy. As opposed to dimensionality reduction, feature selection doesn’t involve creating new features or transforming existing ones, but rather getting rid of the ones that don’t add value to your analysis.

Why is Feature Selection important?

The benefits of feature selection for machine learning include:

  1. Reducing the chance of overfitting.
  2. Reducing the CPU, I/O, and RAM load the production system needs to build and use the model by lowering the number of operations it takes to read and preprocess data and perform data science, improving algorithm run speed.
  3. Increasing the model’s interpretability by revealing the most informative factors that drive the model’s outcomes.

Feature Selection + DataRobot

The DataRobot automated machine learning platform combines multiple approaches for feature selection in its modeling workflow:

  1. Model-agnostic feature importance. Before running any algorithms, DataRobot determines the univariate importance of each feature with respect to the target variable.
  2. Model-specific feature impact analysis. DataRobot produces a quantitative ranking of how impactful each feature is for each model it produces.
  3. Automated feature selection. DataRobot employs expert model blueprints that automatically select relevant features. The platform also supports manual tuning.
  4. Support for multiple feature lists. By running DataRobot on different subsets of features, users can see how different feature lists compare.