A Glitch in the Matrix? Locating and Detecting Language Model Grounding with Fakepedia

Giovanni Monea1, Maxime Peyrard2, Martin Josifoski1, Vishrav Chaudhary3, Jason Eisner3, Emre Kıcıman3, Hamid Palangi3, Barun Patra3, Robert West1
1EPFL, 2Univ. Grenoble Alpes, CNRS, Grenoble INP, LIG, 3Microsoft Corporation

Leaderboard


Fakepedia - Base

Rank Model name Grounding accuracy
Mistral-7B-Instruct-v0.1 92%
Llama-2-70b-chat 90%
Llama-2-13b-chat 84%
gpt-3.5-turbo-0301 61%
Zephyr-7b-β 58%
gpt-3.5-turbo-0613 54%
gpt-3.5-turbo-1106 50%
gpt-4-1106-preview 28%
Llama-2-7b-chat 22%

Fakepedia - Multihop

Rank Model name Grounding accuracy
Llama-2-13b-chat 82%
Llama-2-70b-chat 71%
Mistral-7B-Instruct-v0.1 60%
gpt-4-1106-preview 50%
Zephyr-7b-β 10%
gpt-3.5-turbo-1106 10%
gpt-3.5-turbo-0613 8%
gpt-3.5-turbo-0301 7%
Llama-2-7b-chat 4%