Data Collection

What is Data Collection?

As a society, we’re generating data at an unprecedented rate (see big data). These data can be numeric (temperature, loan amount, customer retention rate), categorical (gender, color, highest degree earned), or even free text (think doctor’s notes or opinion surveys). Data collection is the process of gathering and measuring information from countless different sources. In order to use the data we collect to develop practical artificial intelligence (AI) and machine learning solutions, it must be collected and stored in a way that makes sense for the business problem at hand.

Why is Data Collection Important?

Collecting data allows you to capture a record of past events so that we can use data analysis to find recurring patterns. From those patterns, you build predictive models using machine learning algorithms that look for trends and predict future changes.

Predictive models are only as good as the data from which they are built, so good data collection practices are crucial to developing high-performing models. The data need to be error-free (garbage in, garbage out) and contain relevant information for the task at hand. For example, a loan default model would not benefit from tiger population sizes but could benefit from gas prices over time.

Data Collection + DataRobot

DataRobot partners with several organizations that assist in collecting, storing, and transforming data to make it ready for predictive modeling. Once you’ve collected and prepared the appropriate data for your specific business problem, you can easily import it into the DataRobot automated machine learning platform no matter where you’ve stored it. Then, DataRobot automatically creates new features and builds and evaluates hundreds of machine learning models which you can immediately deploy into production.