Automating Machine Learning for Time Series – Navigating in the Valley of the Blind
Guest post by Dr. Chris Marshall, Associate Vice President, Big Data and Analytics, Cognitive Computing, IDC Asia/Pacific
Heraclitus, the ancient Greek philosopher, noted that change is the only constant in the universe. Seers and prophets since the beginning of time have tried to pull back the curtain of the future based on omens, their insights and experience, and since the age of the enlightenment, with data.
It falls to time series analysis to extract meaningful patterns embedded in these changes in historical data and use them to extrapolate into the future. More formally, time series models forecast output variables (such as trends, seasonality, and special events) and their relationship with input variables. Applications abound, such as forecasting future demand for better operational planning or monitoring complex systems, detecting anomalous signals from IT system logs or the factors driving equipment downtime in a manufacturing site.
Just-in-time inventory, the Internet of Things (IoT), digital transformation, real-time analytics, the rise of e-commerce, and use of unstructured data have forced businesses to up their game analyzing time series data. Of course, time series analysis is not new. We have had formal statistical techniques to analyze trends, cycles, and randomness for decades. The challenge is that these techniques struggle to scale with more complex, real-world, time series problems with their huge data sets with many potential features captured over long time periods. Into this breach steps machine learning techniques, such as ridge regressors, boosted trees, and neural networks. Machine learning approaches make none of the statistical assumptions (about normality, linearity, and stationarity) that undermine traditional formal approaches; they also produce usable scalable models with higher adaptability to multivariate analysis and less risk of overfitting.
However, to date, some of the challenges of machine learning (the lack of data science skills, complex development process and lack of standardized tools) have limited its application to time series analysis. This appears set to change with the development of automated machine learning and its application to time series.
Automated machine learning is best understood as automatically searching through the space of potential models and, for a given data set, choosing the best model based on some predefined criteria. At the very least, this greatly increases data scientist productivity and eventually will empower business users and domain experts with self-service machine learning capabilities that require little or no data science technical skills.
DataRobot is a leader in the development and deployment of automated machine learning tools and has recently added a time series capability to its automated machine learning platform. This provides turnkey, end-to-end automation to search through the modeling space for customers to produce “optimal” target models.
Of course, for technologies like automated machine learning to be useful, organizations and business users must understand the real potential of time series analytics. Enterprises need to timestamp everything. They need an enterprise architecture for data that enables the entire sense, analyze, respond, learn loop in a timely and appropriate fashion. For algorithmic trading, this means a millisecond response. For sales predictions, it means days. IoT applications fall somewhere in between.
Keep in mind, of course–and echoing Heraclitus again–that change is constant and even affects our predictive models. Some things can be predicted well into the future, like the orbits of planets, but that’s unusual. Even the most granular models can only predict the weather a few days ahead. There are limits to what can be learned about the future no matter what technologies we deploy. But these tools and technologies open our eyes a little to the future — and in the valley of the blind, the one-eyed man is king.