What is Feature Engineering for Machine Learning?
Note: before you learn about feature engineering for machine learning, make sure you understand features.
“Feature engineering is the art part of data science.” – Sergey Yurgenson, former #1 ranked global competitive data scientist on Kaggle
Feature engineering is the addition and construction of additional variables, or features, to your dataset to improve machine learning model performance and accuracy. The most effective feature engineering is based on sound knowledge of the business problem for which you’re trying to gain deeper insight and your available data sources. It’s an exercise in engagement with the meaning of the problem and the data. For example, you might improve a model used to estimate likely loan defaults by finding external sources of relevant data, such as local unemployment rates or housing price trends.
Why is Feature Engineering Important?
Understanding Feature Engineering
Creating new features gives you a deeper understanding of your data and results in more valuable insights. When done correctly, feature engineering is one of the most valuable techniques of data science, but it’s also one of the most challenging:
“Coming up with features is difficult, time-consuming, [and] requires expert knowledge. — Andrew Ng, chief scientist of Baidu, co-chairman and co-founder of Coursera, and adjunct professor at Stanford University
Feature Engineering Examples
Imagine you want to predict how many turkeys you’re going to sell this year on Thanksgiving, a major U.S. holiday. To most machine learning algorithms, dates are a string of unrelated numbers with no particular significance, meaning it has no idea which date is associated with Thanksgiving.
However, if we engineer features that tell the algorithm which dates are Wednesdays and which days occur immediately before each federal holiday, it will be able to accurately identify events that frequently happen on the third Wednesday in November – the day before Thanksgiving. Feature engineering exposes this kind of “common knowledge” information and expands the number of practical insights and the business value a dataset can yield.
Feature Engineering + DataRobot
The DataRobot automated machine learning platform automatically applies both fundamental and sophisticated feature engineering techniques to all types of data. For example, DataRobot takes differences and ratios of numerical predictors, encodes categorical predictors by how frequently they occur, extracts individual words and pairs of adjacent words from free text, and selects days of the week and days of the month from date fields. It also allows for manual feature engineering.