オーバーフィット

オーバーフィットとは

Overfitting happens when a machine learning model has become too attuned to the data on which it was trained and therefore loses its applicability to any other dataset. A model is overfitted when it is so specific to the original data that trying to apply it to data collected in the future would result in problematic or erroneous outcomes and therefore less-than-optimal decisions.

Here is the difference between a properly fitted and overfitted model:

オーバーフィット

出典: Quora 社

The overfitted model is not going to be useful unless we apply it to the exact same dataset because no other data will fall exactly along the overfitted line.

Why is Overfitting Important?

Overfitting causes the model to misrepresent the data from which it learned. An overfitted model will be less accurate on new, similar data than a model which is more generally fitted, but the overfitted one will appear to have a higher accuracy when you apply it to the training data. With no protection against overfitting, model developers might train and deploy a model they think is highly accurate, when in fact it will underperform in production when given new data.

Deploying an overfitted model can cause all kinds of problems. For example, if you think your model is 95% accurate in predicting the likelihood of loan default when in reality it is overfitted and has an accuracy somewhere closer to 60%, applying it to future loan decisions will result in the loss of business that would otherwise have been profitable and will result in more dissatisfied customers.

オーバーフィット + DataRobot

DataRobot の自動機械学習プラットフォームは、トレーニングデータからの標本内モデル予測に対してトレーニング、検定、ホールドアウト(TVH)、データ分割、N 分割の交差検定積み上げ予測などのテクニックを使用して、機械学習ライフサイクル内のすべてのステップでオーバーフィットを防止します。DataRobot は、トップレベルのデータサイエンティストの専門知識を組み込み、フィッティングプロセスを自動化します。これにより、モデルの実際の精度を疑うことなく、自社のビジネス上の問題に対する関連性が最も高いモデルを選ぶことに集中できます。
{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”What is overfitting in machine learning?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Overfitting happens when a machine learning model has learned to make predictions based on noise in the training data, which harms its ability to generalize predictions to new data.”}}]}