Can we really trust analytics?
Analytics is becoming more and more ingrained into every aspect of our lives.
From tracking our steps on a fitness wearable to a music streaming service foretelling the songs we would like to hear, there is no escaping data analysis. As such, there is a growing need for people to trust in analytics, particularly in the way organisations use it.
The idea for this blog came to me following Gartner’s recent announcement describing the ‘trust factors’ that influence the ethical use of analytics as “transparent, accountable, understandable, mindful, palatable and mutually beneficial”. In their announcement, Gartner highlights that leading data-driven organisations must recognise that people’s trust in analytics is as valuable as the business outcomes.
This made me reflect on some lessons I’ve learned as a data scientist, how they relate to the interplay between data, models, trust, and results:
- When developing models, balance data fitting with your understanding and domain expertise
For credit risk scoring, models must conform to regulations, and adverse credit decisions must be explainable to affected consumers based on the score formula used. In any domain, a business unable to explain important analytically derived decisions could come under pressure one day for failing to offer explanations, and possibly lose trust with customers if a decision was to be regarded as nonsensical, unfair, or unethical.
So, how can we balance human judgment with letting the data speak for itself? I’m fond of the transparency, flexibility, and ability to incorporate palatability constraints provided by segmented scorecards, and I prefer to benchmark these models and guide their design (such as binning, variable selection and segmentation schemes) through modern ensemble learning algorithms that can discover unexpected and complex relationships. This is a good methodology for balancing accurate data fitting with understanding and domain expertise.
- Understand the process that generated the data and be mindful of biases
Learning from a data set about causality is a more demanding scientific task than finding humble correlations, and it can sometimes be an impossible task. Two key challenges relating to data conditions are omitted variable biases and historic treatment selection biases. For example, a model for price elasticity of demand for ice cream could be severely biased if historic prices had been managed based on weather conditions and if we were to omit weather-related control variables from our model.
For a transparent causal modelling methodology that fosters trust (and for alerting the analyst when treatment selection bias is too strong to infer data-driven causal relations) I recommend treatment propensity score developments and matched sampling prior to “mining” non-experimental data for causal effects.
- Think strategically and design safe experimentation right into your feedback loop
Every business treatment on a customer applied today not only impacts the customer’s future journey and metrics, but it also yields new data samples that could, and should, inform future decisions. Whenever we’re plotting a long-term analytic roadmap, let’s think about optimising today’s decisions – not with a myopic view (which is to exploit our current best estimates of today’s optimal action), but with a strategic view (which is to balance the exploration-exploitation trade-off by continually testing ethical and safe decision alternatives). This approach greatly mitigates any selection bias issues, and helps pave the road to transparent, data-driven, prescriptive modelling.
- Use sequential decision analytics to realise the benefit of information-gathering actions
Conventional prescriptive analytics exploit the available data to arrive at a decision. For some decisions or customers, the available data may be too sparse, too stale, or too biased to make data-driven decisions with confidence. In those cases, falling back on a default treatment (which may be sub-optimal), or gathering additional information are you only real option.
I find it beneficial to think about analytics as an element of a virtuous feedback loop. Data is constantly being turned into predictions and decisions that affect consumers and outcomes, and in turn lead to new information which can improve future models and future decisions. The feedback loop perspective offers various interception points and modes of operation to foster transparency, learn about causes and their effects, and enshrine trust.
A customer must believe that when data is collected it is in their interest, and the business must act on the received information in a way that creates a win for both sides. The music streaming service I’m using is a simple example of trust earned that pays dividends. I’m happy to volunteer my ratings information in exchange for improved future recommendations. Seeing these recommendations actually improve over time makes me a loyal customer. Ultimately, we trust in data that is used in a transparent and mutually beneficial way.