If scientists don’t share their data, how can we trust them? – iNews
This is Geek Week, my newsletter about whatever nerdy things have happened to catch my eye over the past seven days. Here’s me, musing about something I don’t fully understand in an attempt to get my head around it: I imagine that’s how most editions will be. If you’d like to get this direct to your inbox, every single week, you can sign up here.
There’s this friend of mine, Nick Brown, who’s helped me on a few stories I’ve written over the years. The stories are usually along the lines of “Can I actually trust this scientific research?”
He has spotted some very interesting things. He and another scientist, James Heathers, came up with a little tool called GRIM, or “granularity-related inconsistency of means”. It sounds fancy but it’s very simple. Imagine you see a scientific paper that reports that it looked at eight children under the age of 10, and that their average age was 5.33.
You might think that sounds fine, but it’s impossible: if there are eight children, the average age, assuming that age has been entered as a whole number, must be divisible by eight. Something is wrong in the data.
(In case you’re wondering why: if you have an average of eight numbers, it has to end with .00, .125, .25, .375, .5, .625, .75, or .875. Take any number you like and divide it by eight, and you’ll see. And it’s easy enough to understand why. Any whole number divided by eight equals that whole number times one-eighth, and one eighth is 0.125, so any average of eight numbers has to be a multiple of 0.125.)
In the case of just eight kids, that’d be pretty obvious. But GRIM lets you do it with larger sample sizes. And sometimes – not always, but sometimes – an impossible mean is an indicator that the data hasn’t just been wrongly put in, but that someone is committing fraud. Brown and Heathers have detected quite a few cases.
More often, using tools including but not limited to GRIM, they find that scientists are doing other bad practices. A famous one – uncovered by Brown and Heathers and my old colleague at BuzzFeed, Stephanie Lee – was the Cornell food scientist Brian Wansink, who had not been making up data, but who had been chopping it up in various ways until it said something, and then publishing papers off that something.
(If you’re wondering why that’s bad: imagine I thought that eating carrots made you taller, so I took 10 people who ate carrots and 10 people who didn’t, and measured their heights, and there wasn’t much difference. So I thought, how about if eating carrots makes you more ginger? But there’s no difference there, either. So I try if it makes your teeth stronger, or your breath smellier, and keep trying new things, until I find that my carrot-eating group is, on average, slightly more likely to, I dunno, enjoy rock-climbing or something. You’re giving yourself more and more chances to find a coincidence.)
Share options
Over the last 10 years or so science has started to realise that it’s got a lot of problems like this and it’s why lots of old studies don’t replicate (I’ve talked about this in Geek Week before). And part of the way that people are fixing it is by checking over each other’s work.
Many journals now require scientists to include a statement in their papers saying that they will share data with other scientists if requested. It shouldn’t, you don’t think, be a big deal. I imagine that there will be the odd case where you have to be careful about identifiable data – certainly the NHS has been working with Ben Goldacre recently to find ways of using its vast troves of patient data in safe and responsible ways. But if you’ve done some psych study on 50 undergraduates to find out if washing your hands makes you feel less guilty or whatever, then it shouldn’t be a problem.
I’m a bit unnerved, then, to find that a new study has actually checked how many of these scientists are willing to share their data, and the answer is: not very many.
The study, published in the Journal of Clinical Epidemiology, looked at more than 3,500 studies from 300 journals. About 3,400 of those articles had “data availability statements”, of which 42 per cent said that the data was available upon reasonable request.
(I had assumed the other 58 per cent wrote “You can’t check, you’ll just have to trust us,” which would be admirable for its honesty and chutzpah, if not for its dedication to open science and the quest for knowledge. But it turns out most of them had published their data online or included it all in the paper so they didn’t need to make it available.)
The researchers emailed the authors of all of the 1,792 manuscripts which did say data would be available upon request. Of those 1,792, a total of 254 bothered to write back, and of those 254, only 122 actually shared the data. So fewer than 7 per cent of the people who said they’d share the data in fact did so.
This is despite the authors sending reminders, signing the various non-disclosure or data transfer agreements, and sending official letters of request from their university if required. They jumped through the necessary hoops. But still, hardly anyone shared their data.
(“Two authors demanded reimbursement” and “one author requested co-authorship for providing us with data” are my two favourite reasons why some of the data was not, in fact, shared.)
Trust no one
The trouble is that science is not trustworthy. It’s not that scientists aren’t trustworthy – or, to rephrase, it’s not that scientists are any more untrustworthy than anyone else, and if I were to bet I’d say they’d probably score above average on most reasonable trustworthiness metrics.
It’s that science has terrible incentives: you need to publish lots of papers to avoid falling behind in your career, and journals will generally only publish papers that find exciting positive results, so you’re driven to get some data and torture it until it confesses so that you can get something you can sell. And to do that over and over again.
And, of course, in rare – but not rare enough – cases, people are driven to plagiarism, or to just making data up.
There’s an old cliché when you write about science, which is to pompously say “And, as the Royal Society’s motto has it, nullius in verba – take no one’s word for it!” That is – the silly religionists might take things on trust, “It’s true because the Bible says,” and all that, but we scientists, we look for ourselves.
Taken literally, it’s silly, of course. I, Tom Chivers, currently doing a Coursera online course in statistical inference (strongly recommended!) so I can have some sub-A-level grasp of what those funny symbols mean in mathematical equations, cannot go and check the maths of the people who say they’ve found gravitational waves.
But other people can. If the people at Ligo say they’ve found patterns in the data that fit the hypothesis that gravitational waves are real, then someone at, I dunno, Cern or the Massachusetts Institute of Technology can go and look at that data. And then they can tell me, and I can decide whether I trust them. (I probably would.)
In order to do that, though, they need the data. If someone’s saying they’ve found that, oh, I dunno, men eat more when women are around, in order to impress them with their manly eating (a Wansink finding), you need to be able to go and look at that data in order to check it.
And all too often, it seems, they’re not. As Nick Brown said grumpily recently on Twitter, after an earlier study along these lines: “When you ask the authors of articles in Science® to share their data, which the journal told you was mandatory when you submitted your article, the response very often boils down to GFY.” And, yes, GFY stands for what you think it stands for.
Self-promotion corner
I’ve stuck to uncontroversial topics this week, like long Covid! And, um, trans women in sport! So I expect no one will set me on fire or anything.
First, I had a look at how come we keep seeing things saying “One in five (or whatever) Covid patients will get long Covid,” despite the fact that we all know hundreds of people who’ve had Covid and it doesn’t feel like 20 per cent of them have long Covid? My answer is that we’re using different definitions of the term “long Covid”.
And I wrote a short piece (not published at the time of writing) about the physical differences between men and women and the difference it makes in sport. I don’t make any policy proposals – you can use science to inform political decisions, but it doesn’t make them for you. But we need to acknowledge that there are tradeoffs, because trans women will have significant advantages in various sports, and some sportswomen assigned female at birth will lose out on some places on national teams and so on, and there are safety considerations in contact sports. I argue that we need to admit these things if we’re going to make informed decisions.
Nerdy blogpost of the week: I Can Tolerate Anything Except The Outgroup
I am going to allow myself to link to a Scott Alexander post this week. As I’ve said before, I need to be careful doing that, because I became so obsessed with them in about 2016-17 that I actually had to put a blocker on my laptop in order to get any work done. (Websites blocked: Twitter.com; slatestarcodex.com.)
This one is genuinely one of the pieces of writing that has most shaped my thought. It’s about how we all think we’re super-tolerant people, but in fact we’re only “tolerant” of things we actually don’t mind. Real “tolerance” is putting up with stuff you don’t like, don’t agree with. Liberals like me congratulating ourselves for being more “tolerant” because we support gay marriage or whatever: we’re not tolerating anything, because we don’t have any problem with gay marriage.
He starts by quoting GK Chesterton:
In Chesterton’s The Secret of Father Brown, a beloved nobleman who murdered his good-for-nothing brother in a duel thirty years ago returns to his hometown wracked by guilt. All the townspeople want to forgive him immediately, and they mock the titular priest for only being willing to give a measured forgiveness conditional on penance and self-reflection. They lecture the priest on the virtues of charity and compassion.
Later, it comes out that the beloved nobleman did not in fact kill his good-for-nothing brother. The good-for-nothing brother killed the beloved nobleman (and stole his identity). Now the townspeople want to see him lynched or burned alive, and it is only the priest who – consistently – offers a measured forgiveness conditional on penance and self-reflection.
The priest tells them:
“It seems to me that you only pardon the sins that you don’t really think sinful. You only forgive criminals when they commit what you don’t regard as crimes, but rather as conventions. You forgive a conventional duel just as you forgive a conventional divorce. You forgive because there isn’t anything to be forgiven.”
He further notes that this is why the townspeople can self-righteously consider themselves more compassionate and forgiving than he is. Actual forgiveness, the kind the priest needs to cultivate to forgive evildoers, is really really hard. The fake forgiveness the townspeople use to forgive the people they like is really easy, so they get to boast not only of their forgiving nature, but of how much nicer they are than those mean old priests who find forgiveness difficult and want penance along with it.
It’s a long piece, but it’s worth it. Although I will say that upon rereading it I notice that he cites work using the Implicit Association Test, and you probably ought to take that with a pinch of salt, since (replication crisis!) it hasn’t really stood the test of time.
This is Geek Week, a subscriber-only newsletter from i. If you’d like to get this direct to your inbox, every single week, you can sign up here.