The Flaws of Predictive Analytics

As I write this post, I’m listening to a streaming internet radio station that picks songs for me based on my past listening sessions and preferences. I generally enjoy this. The more songs I tell the station I like, the better it gets at finding other songs that I enjoy. It even discovers songs and artists for me that I am not familiar with.

This seems wonderful, and generally it is, but I ponder about what music I am not hearing. What about the artists that I might enjoy that fall outside of my previous-choice-designed scope of music choices? What accounts for a change in my musical taste? What about the misinformation I give the radio station? I have definitely clicked dislike on songs that I’m embarrassed by when I am in charge of playing music for a group of friends, and this varies by which groups of friends I am with. The point I am making is that the data that the radio station uses is useful, but imperfect. Sources of data are dynamic and contextualized in infinite ways. While the radio station has found a signal for my preferences within all the available musical noise, what does the radio station know about the noise within my signal? Understanding the sources of data is essential to using it effectively.

Predictive analytics are based on past behavior as a predictor of future behavior. Past behavior is clearly highly predictive and useful, but what do we know about the contexts surrounding the behavior that these models are built on? Each behavior is in response to a given environment and as we become better at understanding these behaviors via data collection, companies, as they always have, continue to design the environments that people interact with.

McKinsey & Company recently released an article called, “Cracking the digital-shopper genome”, that describes the challenge, and opportunity, of trying to assimilate, organize, and utilize all of the data surrounding a shopper’s decision. What is clear about shoppers, and their corresponding data and behavior, is that they are dynamic. The McKinsey & Company article champions a company for creating a consumer website experience that is tailored to individuals based on the click behavior of that specific consumer. This type of personalization is assumed to be the idealized version of an online shopping experience. A website that “knows” and “understands” you, taking you to the exact products you want, at the exact right time you want them. This would seem to be, generally, useful.

But what if it isn’t? The company designing the website makes choices, intentional or otherwise, about how the data collected is weighted and used to inform applications. Relying on the “wrong data” can have deleterious effects. It isn’t always obvious that you have been using the “wrong data” until it becomes very apparent. As an example, look at Google’s recent embarrassment related to their picture recognition algorithm that mislabeled people as animals. The source data, which I won’t speculate on, for this algorithm clearly wasn’t sufficient enough to make delineations between people and animals.

For businesses selling products online the data choices are also important. In tailoring a website for a specific consumer based upon that consumer’s click behavior, businesses may inflict unintended hurdles that deter the consumer from purchase. Suppose the consumer is shopping for a friend with different tastes than the shopping consumer. A website tailored for the shopper may make it more difficult to find an item for the shopper’s friend. There are solutions to this of course, like having the consumer identify if they are shopping for themselves or someone else, hence telling the website to shift to appropriate source data. The takeaway is that businesses must be intentional about their design and use of their data. This requires the businesses to clearly understand where their data comes from, how their business operates, and what their customers’ value. There is still an art to using data appropriately. Utilizing data leads to better informed decisions, not fully informed ones.

« Back to blog