Overall it was about half the time in Journals, but in the Nature/ Science Journals it was wrong in about 95% of cases. So they are setting the bad example
After we called them on this, they introduced reporting standards. Did they and their referees learn about the statistics?
So recently we got this figure in a Nature Journal and the work was splashed across the media..........The great new hope.
So we looked as the data
Figure legend. Clinical scores of two independent EAE experiments at d23 post disease induction. Individual scores as well as the mean score of two independent experiments are shown. Control: n=10, vehicle: n=13, xxxxx-345: n=11. Control versus vehicle: P=0.620, control versus xxxxx-345: P=0.017, vehicle versus xxxxx-345 P=0.029. * indicate P values <0.05 and ** indicate P values <0.005 based on a non-paired Student’s t test. Error bars are s.e.m.
There is a thought in clinical trial studies that you should supply the primary data so it can be re-analysed. Many companies now willingly do this. This probably will occur or science papers too.
So in the figure above they provide primary data. In the drug-treated animals the scores appear to be: 0, 0 ,0, 0.5, 0.5, 2, 2.5, 3, 3.5, 3.5, 3.5 n=11 in vehicle scores appear to be: 0.5, 2.5, 2.5, 2.5, 2,75, 2.75, 3, 3.5, 3.5, 3.5, 3.5, 3.5, 3.5 n=13.
Do a t test drug verses vehicle and you get p=0.029 (as in the legend to the figure above) so all is fine and drug is great. The media go mad.....an its the next best thing since sliced bread:-)
However, you should not do a t test on this type of data. The assumptions of a t test as that the data is (a) normally distributed. You test this and it passes the test p=0.152,
However, it also assumes that (b) data groups have equal variances (the square of standard deviation). Test for that and it fails P<0.05. So it is not valid to do a t test on this data but importantly it is not valid to do a t test on this data, because the data is not parametric, it is non-parametric.
So do a non-parametric test on the data like the Mann Whitney U test . This has less power to detect differences than a t test.
Do this on the data above and P=0.082.........Ooooops.
So now do drug verses untreated as well and this also fails P=0.121, so not even a trend:-)
There is no statistically significant effect. The drug has not worked! So you accept this or do more studies to show if this so called trend is real or not, harder to do in humans, much easier to do in animal studies.
Simple school-boy stuff. All the reviewers need to do is read: