データのスコアリング

What does Scoring Data Mean?

Note: In data science, there are two types of scoring: model scoring and scoring data. This article is about the latter type.

In machine learning, scoring is the process of applying an algorithmic model built from a historical dataset to a new dataset in order to uncover practical insights that will help solve a business problem.

Model development is generally a two-stage process. The first stage is training and validation, during which you apply algorithms to data for which you know the outcomes to uncover patterns between its features and the target variable. The second stage is scoring, in which you apply the trained model to a new dataset. Then, the model returns outcomes in the form of probability scores for classification problems and estimated averages for regression problems. Finally, you deploy the trained model into a production application or use the insights it uncovers to improve business processes.

For example, to score a model meant to predict the likelihood of customer churn:

  1. 解約した顧客に関する情報や解約の原因と思われるその他情報を含む履歴データセットを使用して、解約行動モデルを構築します。
  2. モデルを既存の顧客データに適用して、解約する可能性を推定する値、つまり「スコア」を生成します。

モデルをスコアリングする方法はいくつかあります。

  1. バッチスコアリング。モデルの決定を直ちに実装する必要がない場合に便利です。たとえば、マーケティング担当者は、購入したリードのリストに基づいてモデルをバッチスコアリングし、商品を購入する可能性が最も高いリードを判断することができます。
  2. リアルタイムスコアリング。モデルから価値を実現するにあたって時間が最重要である場合に役立ちます。たとえば、銀行が詐欺の可能性があるトランザクションを迅速に拒否するには、クレジットカードトランザクションが数ミリ秒以内でスコアリングされる詐欺モデルが必要になります。

スコアリングを使用して、既存のモデルを評価することもできます。履歴データでモデルをトレーニングし、結果がわかっている他の履歴データをそのモデルでスコアリングして、スコアを既知の値と比較することで、モデルのパフォーマンスがどのぐらい適切であるかを判断します。

Why is Scoring Important?

Scoring is a key component of understanding machine learning model outcomes and choosing the most accurate model that produces the most valuable insights. Once you have a model in production scoring new data, you’ll uncover insights that you can use to create business value.

Using the above example, the model scores identify which current customers are at a high risk of churning, enabling you to plan outreach or special offers to prevent that from happening.

スコアリング + DataRobot

DataRobot’s Prediction Explanations feature has a great visualization of model output scores:

preview of prediction explanations

DataRobot shows the score in the second column from the left after the individual record ID.

In the above example, a hospital has built a classification model to determine the likelihood that a patient will be readmitted in 30 days or less. The model’s score for patient ID 9155 is 0.888. In other words, this patient had an 88.8% likelihood of being readmitted prior to the end of 30 days. The "Explanations" columns list the top factors that contributed to that probability score.

Using the model score, the hospital can take action to reduce the probability of readmissions, which might include delaying the discharge of patients that are similar to those with high readmission scores, resulting in better patient outcomes and a lower instance of the hospital being fined.

For information on how DataRobot handles scoring and deployment, see the Deployment wiki entry.

{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”What is scoring data?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”In machine learning, scoring is the process of applying an algorithmic model built from a historical dataset to a new dataset in order to uncover practical insights that will help solve a business problem.”}}]}