The way to fix this is simpler: ensure counter-factuals are present in the training data, then the VLM will learn not to be dependent on its language priors/knowledge.
The way to fix this is simpler: ensure counter-factuals are present in the training data, then the VLM will learn not to be dependent on its language priors/knowledge.