Unsupervised Question Decomposition for Question Answering

Ethan Perez, Patrick Lewis, Wen-tau Yih, Kyunghyun Cho, Douwe Kiela (EMNLP 2020)

[link]

This paper tackles a problem that seems fundamentally quite important to me. Given a question, how can we decompose it in easier questions? This is clearly related to how humans answer questions. Suppose you don’t know Alice and Bob, and I ask you: who is older, Alice or Bob?'', that naturally maps to at least three subquestions: how old is Alice?'', how old is Bob?'', and which number is greater?''. It turns out we have Question Answering models that are good at answering each of these simpler questions. But how do we break down a complex question?

The paper proposes to use retrieval as a way to form subquestions. In particular, it assumes a large set of questions (without labels or answers). In the paper, they come from CommonCrawl, i.e. collected from the Web. Then, given a question, they find a pair of questions that maximally diverge from each other but are still related to the complex question. Then, they train a complicated model, using back-translation and several denoising objectives, to train a model for question decomposition, and along with it a model that given a decomposed question and the answers that a simple QA model gives to each subquestion, produces an answer for the complex question.

There are a number of things in the paper that I find not elegant, solution-wise. For example, they always decompose a question into two subquestions. If you give their model a simple question that their base QA model can answer, they would still decompose it. Also, it can’t decompose them into more than 2, and from their objective it does seem computationally hard to extend it. Also, the model is quite complicated, with an amalgamation of different unsupervised objectives, which has been common in unsupervised NLP. Finally, the idea of maximally diverging subquestions does not appear sound to me. For example, how old is Alice?'' and how old is Bob?'' are very similar, yet they are the right questions to ask. Their model seems to produce paraphrases (e.g. how many years ago was Bob born?'') to get around that, which doesn't feel like what should happen. I think you want questions that provide you different (complementary) useful bits of information, not necessarily questions that are as divergent as possible. For instance, you don't want to ask how old is Alice?'' and ``what year was Alice born in?'', even if they’re both quite different.

However, it is to their merit that they find a way to do everything unsupervised, which is new, and the problem seems quite important to me. That I like, for sure. Looking forward to further work on this.

Gabriel Poesia
Computer Science PhD student