Enabling Tomorrow’s Business Analysts with Predictive Analytics: Part One
Thanks to Dr. Kai R. Larsen, Associate Professor of Information Management, University of Colorado, for contributing this two-part guest blog.
Part 1: Addressing the Challenges and Complexities Inhibiting Business Users’ Adoption of Predictive Analytics
Predictive analytics is reshaping business and society. Our very perception of reality is changing due to algorithms catering to our instincts, needs, and wants. This raises serious questions about how business schools should prepare graduates. One answer may be to teach predictive analytics to undergraduate students across all majors. What would it take to implement this obviously important vision and why is it not currently being done?
As a business analytics professor, my goal is to teach students–who usually represent a mix of business-minded undergraduates ranging from those who immediately understand how predictive analytics has reshaped their future jobs (Information Management and Marketing), to those for whom different flavors of business analytics has long since infused into the core of their fields (Operations Management and Finance), and to those for whom predictive analytics currently is “only” reshaping a small part of their discipline (Accounting). What is becoming clear is that all of these majors must, at a minimum, understand predictive analytics conceptually to make decisions that will affect the future of their companies as these tools continue to provide business insights and drive change within and outside their enterprise.
To be effective contributors for their future employers, students must learn how to blend data from dozens of sources, including spreadsheets, databases (relational and NoSQL), social media, and external data providers. The combined data must then be cleaned, imputed, summarized, and put through dozens of pre-processing steps that will change between datasets, depending on what predictive algorithms will be employed. In fact, ignoring all the hundreds of slight variations on each core type, there are several types of algorithms that combine many versions of the core types with great success, including:
- Bayesian approaches
- Decision trees
- Regression (classical and regularization)
- Neural networks (classical and deep learning)
- Ensemble algorithms
And this ignores all the learning related to unsupervised approaches such as market basket analysis and related recommendation algorithms, as well as clustering and dimensionality reduction algorithms that have been key to the success of Marketing for decades. The success of natural language processing (NLP) has brought questions about whether and to what extent text may improve predictive results, but comes with major training implications.
Overwhelmed? As faculty, we surely are. All of these complexities have combined with enterprise needs to drive the popularity of MS programs in business analytics that have now popped up in almost all notable business schools. As this market is now nearing saturation we must realign our focus to enable all our undergraduates to communicate with and properly direct the new class of data specialists. While it will take awhile, the reality is that almost all of our current undergraduates will likely be asked to manage or collaborate with data specialists within their first few years of employment. To do so, they must themselves possess at the minimum a core set of data blending and predictive analytics skills.
This year marks a decade since I first started bringing predictive analytics into the business school classroom, sometimes with great success and other times markedly less so. My failures taught me more about what is possible and the boundaries of the possible. Sometimes I failed because my skills had not kept up with what I wanted to teach, and sometimes it seemed due to the technological capabilities of my university not yet catching up to my “big(ger) data” aspirations. However, I can conclusively say that most of my failures were caused by abysmal tools and the massive number of manual steps needed to make up for the tool incapabilities, always pushing more content into an already bloated class.
Only in the last year have I seen technological trends reversing. I now believe that with the right tools, for the first time, predictive analytics can reasonably be brought into the core of business school education. This is quite fortuitous, as predictive analytics is changing society at a dizzying speed. To send undergraduates into industry without at least conceptual knowledge of predictive analytics may be akin to sending students out without knowledge of accounting or marketing themselves. In all these cases, students could probably survive for awhile if they specialized properly outside of those areas, but they will clearly be unprepared for collaborative work and most leadership positions.
Starting with predictive analytics itself, and ignoring the data blending required to “feed” the algorithms a balanced diet of data, I see two complexities that have to be addressed:
- Algorithm-specific pre-processing of data: For example, students must know that some algorithms such as regression will throw out a whole row of data if any feature (independent variable or predictor, fill in your favorite name) is missing a value. This brings with it a whole set of knowledge requirements for how to best impute missing values.
- Algorithm evaluation and selection: There are hundreds, if not thousands, of algorithms to select from; some commercially restricted but most open source and available in specific packages in R and Python, or in special-purpose, cutting-edge packages like Tensorflow from Google. Predictive analytics used to require both an understanding of all of these and knowledge of how to select between them; the same algorithm will seldom be best for two different problems. As data sizes grow, evaluating all these algorithms brings outsize challenges for the infrastructure required to teach predictive analytics. These two complexities together explain why predictive analytics has remained in the purview of year-long MS programs in analytics.
To take the next step of developing students who can actually be productive analytics contributors in the enterprise, we must also teach data access and blending. I do not believe these skills need be taught to all majors. However, if more than conceptual analytics capabilities are desired in the business school care, there are certain minimum data blending skills required. This presents three additional challenges:
- Accessing data files and understanding their functions: As anyone who has tried to teach R or Python to undergraduates in a core, required class can testify, teaching students how to consistently remember how to access comma-separated files—not to mention the sheer multitude of different functions required for Excel files, database tables, Twitter and Social media feeds—is a major challenge.
- Joining different data together: Generally speaking, this requires logic that derives from relational algebra, and presents a major challenge for most students the first time it is encountered.
- Aggregating and summarizing data: For example, if we want to analyze the likelihood of a customer switching cellphone provider the day their contract expires (churn), we may join their customer record with the table containing information on the five times they called customer service (data may exist on their level of satisfaction on a scale from 1-5). We would need to join the two tables and then aggregate the resulting table back down to the customer level by adding features related to their average, minimum, maximum, and final level of satisfaction.
To summarize, predictive analytics can be hard. There’s widespread consensus that more non-technical fields in the business world should be bringing data science into their everyday operations to enable smarter, faster business decisions, but there’s been a reasonable barrier to doing so. The complexities associated with delivering predictive analytics in addition to preliminary requirements to get data available, cleansed and prepped for analytics can seem overwhelming – especially if you’re aiming to go from zero experience to productivity within a semester.
In Part 2 of this blog, I will describe reasons for bringing DataRobot and Alteryx into the classroom, and the success students are having with predictive analytics as a result.