MIT geospatial 1920x600

Geospatial Data: Where Machine Learning Meets Life

May 11, 2022
· 5 min read

The proliferation of cell phones and cars with GPS connectivity, combined with the explosive growth of the internet of things (IoT), means there is more geospatial data available today than ever before. And it is increasing all the time, from satellite imagery to sensor data.

This data allows machines to draw insights from the patterns of life—people’s mobility habits and behaviors and how they interact with their environment. From insurance companies customizing premiums based on real-world driving to refining conservation policies based on urban sprawl, businesses and governments are beginning to understand the benefits of bringing geospatial data into their machine learning applications.

Certainly, the opportunities are huge. And we’ve only really started to scratch the surface of its potential.

What’s Driving Increased Interest in Geospatial Data?

The increased supply of analysis-ready geospatial images that covers wide areas is largely driving the increased interest in geospatial data. . Thanks to their own use of machine learning algorithms, so-called satellite-as-a-service providers have driven down the costs of this data by using algorithms to extract and classify features from satellite imagery. These algorithms have also made it much easier to regularly update these often large datasets to keep them relevant. Growing awareness of this kind of geospatial data has driven interest in other types of datasets like map data.

This wealth of clean, verified, and ready-to-go geospatial data is a game-changer. By applying these datasets to existing problems, we can reveal richer insights.

Human Geography is the study of people and their interaction with their environment. So, although the term geospatial data sounds a little cold and clinical, it is intimately tied into the human experience. There is enough of a geospatial element in every facet of our daily lives to interest almost all businesses and governments who can reveal and respond to unseen factors and influences by better understanding what’s happening in time and space. That might mean making census-style population surveys annually based on analysis of satellite imagery versus once-a-decade direct outreach campaigns. This would enable much more responsive public services, for example. Or it could simply mean using foot traffic data to identify the optimum time to put out fresh groceries.

Either way, geospatial data provides insight into the spaces around us and how they can be optimized to create better experiences or more sustainable and efficient operations. And organizations don’t need to wait for a new problem to solve in order to start making use of geospatial data.

How Can Businesses Start Working with Geospatial Data?

In every industry and in every business function, there is value to be had in adding geospatial layers to your analysis. The best way for businesses to prepare to add a geospatial element to their machine learning applications is to revisit their current problem statements. Just as with most other areas of data science and applied machine learning, it’s a process of retesting your hypotheses with geospatial factors, discovering what works and building iteratively off that. Each time you revisit your problem statement, you’re enriching your data with geospatial elements to create a more accurate and detailed picture of your problems and predictions and of your business and its customers.

Organizations may, in fact, already have spatial components to their own data, in-house but unused. Anything that identifies a space or a place, such as a zip code, can become a geospatial component. So, by classifying customers into zip code boundaries you unlock the ability to think about them spatially and for a machine learning system to apply spatial understanding. For example, tagging sensors with a location can add geospatial components in your data.

Geospatial Data in Action: What Is the Predicted Sale Price for a Home in Utah?

A more detailed example would be leveraging geospatial elements to better predict or assess house values. “What is the predicted sale price for a home in Utah?” might seem like a pretty straightforward question but accurately determining real estate values is a struggle for many firms. How do you know which variables deserve more weight than others? What factors are buyers looking for? How will the market change? Location AI can give you the answers by providing the tools to combine location variables with numerical, categorical, date, image, and text data to unlock the full potential of your geospatial data.

In the case of real estate in Utah, that means enriching typical listing information with geospatial data to see what really influences house prices. The former includes numeric (price, bedroom counts, bathroom counts, acres, etc.), categorical (garage, exterior, and roof types, etc.) and location geometry (i.e., longitude and latitude) features. Depending on your hypothesis, the latter might include select demographic variables from the U.S. Census Bureau, walkability scores, highway distance, school district scores, and distance to recreation.

As this use case illustrates, adding a spatial component to a problem is a way to better contextualize the human experience within machine learning algorithms. And as our example list of data sources above shows, there’s tons of this information out there. That’s the fun part of geospatial data; the main challenge is finding the best fit for your models.

Hopes for the Future of Geospatial Data in Machine Learning

At a very high level, machine learning seeks to make large amounts of data consumable and understandable. It aims to reveal patterns that would otherwise slip past unnoticed. And in the context of geospatial data, I have several hopes for new developments that increase its potential to significantly improve our understanding and enjoyment of the world around us.

Among these, leveraging technology to shorten the loop between extracting features from satellite imagery and on-the-ground verification would provide faster turnarounds on verified imagery data sets. (Today, this often involves a human visiting a location to physically check the accuracy of what an algorithm says is in an image.) Current barriers to entry for consumer-grade enrichment data – primarily costs – need to come down, too, to move us towards a more open geospatial data environment. OpenStreetMap is a great repository for data, but it is something of an outlier right now, and we have yet to see the wide availability of free and open data that we see in other areas.

And, finally, I hope we can use geospatial data to drive machine learning applications that are predictive rather than reactionary. In a previous role, I was a human geographer, studying the impact of events such as rising fuel prices on societal instability. If we can train machines to use geospatial data to automatically predict instability before we have to send out food or medical aid, that would have a real and immediate impact on people’s lives.

Ultimately, a future with more “live” and open geospatial data will enable greater responsiveness and accelerated time to value – and, more importantly, better human experiences.

See DataRobot in Action
See a demo
About the author
Patrick Wilson
Patrick Wilson

Geospatial Engineer formerly of DataRobot

Patrick Wilson has over 10 years of experience solving complex geospatial problems through geospatial data development projects and programs of record. Prior to joining Data Robot, Patrick directly managed on-the-ground data production projects supporting the U.S. government. Patrick holds a Master of Science in Geographic Information Science and Bachelor of Science in Geography from Florida State University.

Meet Patrick Wilson
  • Listen to the blog
  • Share this post
    Subscribe to DataRobot Blog
    Newsletter Subscription
    Subscribe to our Blog