The ‘Reversal Curse’ refers to a 2023 study that showed that large language models that learn “A is B” don’t automatically generalize “B is A”. A recent pre-print paper that focuses on medical Visual Question Answering (MedVQA) suggest this phenomenon also transfers to multimodal models. Uh-oh! While these models continue to shatter records on industry benchmarks, the researchers call into question the robustness of these evals: what are they even measuring?
No comments:
Post a Comment