Newer large language models (LLMs) are less likely to admit they don’t know an answer to a user’s question making them less reliable, according to a new study. Artificial intelligence (AI) researchers from the Universitat Politècnica de València in Spain tested the latest versions of BigScience’s BLOOM, Meta’s Llama, and OpenAI's GPT for accuracy by asking each model thousands of questions on maths, science, and geography. Researchers compared the quality of the answers of each model and classified them into correct, incorrect, or avoidant answers.
No comments:
Post a Comment