You’re probably familiar with the phrase “lies, damn lies, and statistics”. It’s primarily a complaint against the “creative” use of data.
Of course, it’s also a witty aside about the power of data to mislead, which tells us something about propaganda and our long-time distrust of the political class.
But, perhaps, most of all, it is also a salutary warning for data scientists: data doesn’t have to be wilfully misrepresented in order to mislead.
When Microsoft launched an AI-driven chatbot on Twitter in 2016, the technology company had to apologise and take it offline 16 hours later. Unfortunately, “Tay” had learnt the very worst the Internet had to offer and was posting racist tropes and slurs.
Tay isn’t the only machine-learning project to demonstrate the ease with which existing data can bake in the wrong things. Amazon had to drop a project that automated the job of reviewing applicants’ résumés when searching for top talent because it was biased against women.
The decision engines were being taught on existing data. And since few women currently make it through the glass ceiling and into the boardroom, Amazon’s AI made decisions based on that existing bias.
It’s always a good idea to work with a team of people who come to the data with different perspectives and don’t have a particular axe to grind.
While machines do this as a direct result of the existing data used to teach them, we are quite capable of behaving the same way.
A study by Lee Jussim at Rutgers University showed that, in many cases, scientific researchers had come to a conclusion that fitted the data but had not eliminated alternative conclusions that could have explained their data equally well.
When other scientists came to review their work, they were unable to replicate the conclusions drawn. Clearly, even experienced scientists can read what they want to into their “results”.
There are all kinds of biases. Did you hear the one about the sales manager who was so keen to launch his new product that he completely ignored the advice of experts? There’s no point commissioning market analysis if you simply ignore it when it tells you what you don’t want to hear.
For this reason, it’s always a good idea to work with a team of people who come to the data with different perspectives and don’t have a particular axe to grind.
Your optimum situation is to be able to pull a team together who are able to challenge the data and your findings – a team willing to dig deeper into the data to understand what is really going on and why.
And if you aren’t sure whether you and your team are drawing the right conclusions, consult more widely.
If you allow the data to lie to you (wilfully or unwilfully), you’ll be left with bad decisions and disappointing results.