This paper proves that ``learning disentangled representations is impossible''. This is a somewhat expected result (as the authors admit) when you consider *all* the generative models that could have given rise to the observed data: for any algorithm that recovers the axes of variation of the latent space, there's an infinite number of equivalent generative models that could have generated the same data, and where the recovered axes are completely ``entangled'' (changes in one always causes changes in the other).
The other main set of observations is that, empirically, many recent disentangled representation learning algorithms give very unreliable results, even when the underlying data does have disentangled axes of variation. This also does not come as a surprise, as most ML models are extremely brittle and sensitive to dataset and hyperparameter choices. Unfortunately, the default expectation for anyone who has done deep learning in practice is that it takes a lot of work to get things to work in new settings (when they do).
This does not mean that disentangled representation learning is impossible in practice, as the authors admit (end of Section 3). Practical data might have the required properties, but what their theorem says is that to recover the disentangled representation the algorithm will need proper inductive biases. Clearly each existing model will have its own biases, even if implicit in their presentation. Thus, the point is perhaps that understanding those biases deserves more importance.