Is analysis necessarily data-driven? Does analytics include optimization?
Posted by Laura McLay on June 25, 2012
INFORMS defines analytics as:
“Analytics — the scientific process of transforming data into insight for making better decisions.”
I believe that optimization is a critical part of analytics. As Jean Francios Puget states, “Optimization seems covered by the ‘making better decisions’ part.” As an optimizer, I am completely biased in favor of this answer. The real question Jean Francios asks is not about optimization, but how analytics ultimately seems to start with data, even optimization. Is a data-centric view essential to analytics?
For the past few years, I thought “yes.”I did not have the foresight to take data-centric courses in graduate school. I have been trying to make up for that ever since. We live in a world that collects increasing massive and complex data sets. Being able to analyze data is generally a critical starting point from which to start doing analytics.
Some of my research is in the area of emergency medical services. Locating ambulances, for example, is a classic application of facility location, p-center, and p-median problems. Ambulance location models existed before ambulance data was routinely collected. Being able to build models without data necessitated assumptions regarding what the data would look (e.g., assuming that calls arrive according to a Poisson process, although I often find that assumption to be realistic!) Other models were built with limited data that was painstakingly collected: I like this model of fire engine travel times that led to the so-called square root law. Having a simple model for fire engine travel times made it easier to build models that were not data-driven. Having access to oodles of data today is a huge help in building good models and understanding which assumptions are okay to make. The more data the merrier.
Now, I think differently. Building new models imposes a new and potentially useful way of looking at a problem. Data isn’t necessary to build a model (although it is surely helpful) nor should a data set be the ultimate starting point of every model. My students found it useful when I argued that bus accidents are a Poisson process. Without any real data, we could gain useful insights for how to view a cluster of bus accidents that occurred in the Richmond area one week.
Most data that is collected is not analyzed. Much of the analyzed data is “analyzed” using summary statistics. In my experience, statistical models often provide useful explanations of what happened in the past. Analytics is not backward-looking; it is forward-looking. “How would you improve what you do as a result of this data analysis” is a question that is not asked enough. Today, it’s hard to imagine data not being part of the answer, but it certainly could be excluded as part of an “analytics” solution. It is certainly possible/desirable to use a non-data driven approach to look to the future.
I realize that arguing for data not being included as part of analytics is going to be a tough sell. The “analytics” culture is aimed at those who are knee deep in data on any given day. Maybe people like me who have been trained more on the models/optimization side than on the data side just don’t want to be left behind. I hope there is some legitimate truth to this blog post.
A case study
How essential is data for analytics? What is its role? Please leave a comment?