The Royal Society
Browse

Supplementary material from "Can Large Language Models help predict results from a complex behavioural science study?"

Version 2 2024-09-20, 09:51
Version 1 2024-09-04, 12:04
Posted on 2024-09-20 - 09:51
We tested whether Large Language Models (LLMs) can help predict results from a complex behavioural science experiment. In Study 1, we investigated the performance of the widely used LLMs GPT-3.5 and GPT-4 at forecasting the empirical findings of a large-scale experimental study of emotions, gender, and social perceptions. We found that GPT-4, but not GPT-3.5, matched the performance of a cohort of 119 human experts, with correlations of 0.89 (GPT-4), 0.07 (GPT-3.5), and 0.87 (human experts) between aggregated forecasts and realized effect sizes. In Study 2, providing participants from a university subject pool the opportunity to query a GPT-4 powered chatbot significantly increased the accuracy of their forecasts. Results indicate promise for AIs to help anticipate—at scale and minimal cost—which claims about human behaviour will find empirical support and which ones will not. Our discussion focuses on avenues for human-AI collaboration in science.

CITE THIS COLLECTION

DataCite
3 Biotech
3D Printing in Medicine
3D Research
3D-Printed Materials and Systems
4OR
AAPG Bulletin
AAPS Open
AAPS PharmSciTech
Abhandlungen aus dem Mathematischen Seminar der Universität Hamburg
ABI Technik (German)
Academic Medicine
Academic Pediatrics
Academic Psychiatry
Academic Questions
Academy of Management Discoveries
Academy of Management Journal
Academy of Management Learning and Education
Academy of Management Perspectives
Academy of Management Proceedings
Academy of Management Review
or
Select your citation style and then place your mouse over the citation text to select it.

SHARE

email

Usage metrics

Royal Society Open Science

AUTHORS (7)

Steffen Lippert
Anna Dreber
Magnus Johannesson
Warren Tierny
Wilson Cyrus-Lai
Eric Luis Uhlmann
Thomas Pfeiffer
need help?