Recently, Yoshua Bengio published a blog post called “Time to rethink the publication process in machine learning”. I criticise the quality of (machine learning) research a lot in this blog. Hence, let’s have a look at some of his points.

Comment on the problem statement

I have been part of discussions with program committees about how to improve these conferences, but usually the discussions are about incremental changes.

Well, this seems to be a common problem nowadays. Many people seem to be terrible afraid of large changes and proper solutions to problems because someone could be offended or because it involves a certain degree of failure… .

The research culture has also changed in the last few decades. It is more competitive, everything is happening fast and putting a lot of pressure on everyone, the field has grown exponentially in size, students are more protective of their ideas and in a hurry to put them out, by fear that someone else would be working on the same thing elsewhere, and in general a PhD ends up with at least 50% more papers than what I gather it was 20 or 30 years ago.

I think there is nothing wrong with a bit more competition in research than in the past. While cooperation should lead to better results, I would argue that a lot of “cooperation” seems to be more of a cover up to prevent your “collaborators” to make any progress.

Regarding the amount of papers, I split this observation into several categories:

  1. Unlike 20 - 30 years ago, a dissertation is written around several publications in journals nowadays whereas in the past journal papers might be simply derived from a dissertation

  2. The barrier to publish and access journals is lowered (e.g. online only journals)

  3. A larger concept is split into several publications to allow for a cleaner thought process per paper (e.g. articles 1-3 introduce each core concept, paper 4 glues it together)

  4. Technological advances. Things that would have taken a whole PhD period (3-5 years) to compute can be computed in several days. This allows for more experimentation and, if done correctly, leads to more results which can be published (even as a side project).

  5. Last but not least: publish or perish. It doesn’t matter how bad a paper is and it doesn’t really matter if the content of the paper reflects what is promised in it’s title/abstract. The only thing that counts is that something is published somewhere (even in a journal published by someone’s own department/institute - very common). Obviously, this is the point that causes most trouble and outranks all the other reasons for more publications.

The field has almost completely switched to a conference publication model (in fact a large part of computer science has done so too) […]

This is an interesting point. I knew conference presentations and posters (both with or without an accompanying article) as something to discuss ideas and put “unfinished” concepts/proposals out there for discussion/scientific scrutiny whereas a journal paper should be something more sophisticated. Nowadays, I come across blog posts of higher quality than conference proceedings or journals.

Many papers end up being submitted that in the past would not have been submitted. They may contain errors, lack in rigour or simply be rather incremental.

Submitting is one thing, but why are they accepted - this is the more interesting question ;). Further, tt is difficult to find a paper that doesn’t contain a ton of errors. Even “reference implementations” published on github contain errors or show code that differs from the math promised in the paper.

On the other hand, I am convinced that some of the most important advances have come through a slower process, with the time to think deeply, to step back, and to verify things carefully. Pressure has a negative effect on the quality of the science we generate.

I disagree of the notation of “speed” here. I think it would wise to talk about a thorough process instead of a slow process. A thorough process might be super fast for super slow but it should guarantee a certain quality. This is absolutely contrary to the current trend in research.

However, I would like to point out something else here: many papers are written close to a deadline anyhow because everyone is busy with other things. If we look into regions like Austria/Germany/Switzerland, most PhD students are employees of a university or research institute and mainly work on things not related to their project. In other regions this might be less extreme in this matter but to some degree people in other places on this planet seem to spend more time on twitter than with their research ;)

Or in other words: a lot of the “slowness” I experienced in research was caused by researchers simply not having time for their research (or no interest in a particular topic). Further, many people promoting “slow research” were simply lazy and wanted to get paid for drinking coffee and doing nothing. (This is a problem for the few left who are really interested in research.)

Comment on the solution proposed

My feeling is that besides the poor incentives for reviewing, our current system incentivizes incremental work […]

This is particularly bad for journals by so called publishers. They get a shit loads of money from libraries and reviewers are often not paid (oh yes for reputation - good joke!)

This brings to mind a different model, one where papers are first submitted to a fast turnaround journal (which could be JMLR in this case) and program committees of each conference then pick the papers they like the most from the list of already accepted and reviewed (and scored) papers (assuming the authors are interested in having their work presented at a conference).

Wow. This is heavy. While I do consider JMLR as one of the better ML related journals, I think that this process would introduce a super large queue of papers to be reviewed. And once more journals join this process it is compromised again (unfortunately)… .

But now we have arXiv which plays that role much better, so the main role of conferences, besides the socializing, should be to select work to be highlighted and presented orally, to create a diversified offer of the best and most important ideas arising in our community, to synchronize researchers around this progress. It doesn’t even have to be super-recent work, it could be work which got done 1 or 2 years ago and is only recently picking up steam in terms of impact.

This is an interesting idea. However, I’m afraid that this system would be compromised after a few years again. Moreover, this would require that many people involved with reviewing and organizing conferences would:

  1. understand the topic/field and math behind it
  2. remain open about seemingly crazy ideas.

Additional thoughts

I see these problems across all disciplines in science and engineering but going to focus on ML here.

The dataset problem

Currently, we clearly see a trend towards over-optimizing architectures for several standard datasets (e.g. ImageNet, Pascal VOC, COCO for computer vision). However, a small subset of interesting paper might contain an ablation study on “non-standard” datasets where some newly proposed ideas outperformed everything else but obtain similar results on the standard datasets. (Personally, I don’t care about 0.5% improvement but about significant advances which could cut down model sizes, training times and inference speed.) As a replacement for MNIST, I proposed CMNIST for everything that should be tested on 784 pixels to use/generate arbitrary complex datasets of this spatial dimension.

I propose that every new idea should be tested on at least 10 non-standard datasets.

Economic implications

This is an interesting point. Nowadays, you can’t write a grant proposal without making up some economic benefit a project leads to (KPIs and all that useless shit). Did anyone every assessed the economic loss of bad papers and research in general? Just think about how many useless/incorrect paper we are reading every week and which basically are a waste of time and money to read? Or how much damage is caused by incompetent researchers in industrial R&D?

Reviewer’s bias

There is one major issue with peer-review: do the reviewers apply the scientific method or do they follow their ideologies and personal interests? Some reviewers require unrelated stuff in an article to get themselves or some of their peers cited.

Summary

Research is broken beyond repair.