特徴量エンジニアリング

機械学習での特徴量エンジニアリングとは

注: 機械学習での特徴量エンジニアリングを学習する前に、特徴量について理解しておいてください。

「特徴量エンジニアリングはデータサイエンスの芸術的な部分である」– セルゲイ・ユルゲンソン(Kaggle のデータサイエンティスト世界競技ランキング元 1 位)

Feature engineering is the addition and construction of additional variables, or features, to your dataset to improve machine learning model performance and accuracy. The most effective feature engineering is based on sound knowledge of the business problem and your available data sources. Feature engineering is an exercise in engagement with the meaning of the problem and the data. For example, you might improve a model used to estimate likely loan defaults by finding external sources of relevant data, such as local unemployment rates or housing price trends.

特徴量エンジニアリングが重要である理由

特徴量エンジニアリングについて理解する

Creating new features gives you a deeper understanding of your data and results in more valuable insights. When done correctly, feature engineering is one of the most valuable techniques of data science, but it is also one of the most challenging:

「特徴量を考え出すのは難しく、時間がかかり、専門的な知識が必要です」— Andrew Ng 氏(Baidu 社元チーフサイエンティスト、Coursera 社共同会長兼共同創立者、スタンフォード大学非常勤教授)

特徴量エンジニアリングの例

Imagine you want to predict how many turkeys you’re going to sell this year on Thanksgiving, a major U.S. holiday. To most machine learning algorithms, dates are a string of unrelated numbers with no particular significance, meaning it has no idea which date is associated with Thanksgiving.

However, if we engineer features that tell the algorithm which dates are Wednesdays and which days occur immediately before each U.S. federal holiday, the algorithm will be able to accurately identify events that frequently happen on the third Wednesday in November – the day before Thanksgiving. Feature engineering exposes this kind of “common knowledge” information and expands the number of practical insights and the business value a dataset can yield.

特徴量エンジニアリング + DataRobot

The DataRobot automated machine learning platform automatically applies both fundamental and sophisticated feature engineering techniques to all types of data. For example, DataRobot takes differences and ratios of numerical predictors, encodes categorical predictors by how frequently they occur, extracts individual words and pairs of adjacent words from free text, and selects days of the week and days of the month from date fields. DataRobot also allows for manual feature engineering.

{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”What is feature engineering?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Feature engineering is the process of transforming features and generating new features from existing features in increase a model’s accuracy.”}}]}