Details of Grant

EPSRC Reference:

EP/V05645X/1

Title:

ReproHum: Investigating Reproducibility of Human Evaluations in Natural Language Processing

Principal Investigator:

Belz, Professor A

Other Investigators:

Reiter, Professor E

Researcher Co-Investigators:

Project Partners:

Charles University	Edinburgh Napier University	Free (VU) University of Amsterdam
Heriot-Watt University	McGill University	Peking University
Pompeu Fabra University	Technical University of Darmstadt	Technological University of Dublin
Tilburg University	Trinity College Dublin	Trivago N.V.
University of Groningen	University of Heidelberg	University of Malta
University of Manchester, The	University of Michigan	University of North Carolina Charlotte
University of Santiago de Compostela	Utrecht University

Department:

Computing Science

Organisation:

University of Aberdeen

Scheme:

Standard Research

Starts:

01 April 2022

Ends:

31 May 2024

Value (£):

227,202

EPSRC Research Topic Classifications:

Artificial Intelligence

Computational Linguistics

EPSRC Industrial Sector Classifications:

Information Technologies

Related Grants:

Panel History:

Panel Date	Panel Name	Outcome
23 Mar 2021	EPSRC ICT Prioritisation Panel March 2021	Announced

Summary on Grant Application Form

Over the past few months, we have laid the groundwork for the ReproHum project (summarised in the 'pre-project' column in the Work Plan document) with (i) a study of 20 years of human evaluation in NLG which reviewed and labelled 171 papers in detail, (ii) the development of a classification system for NLP evaluations, (iii) a proposal for a shared task for reproducibility of human evaluation in NLG, and (iv) a proposal for a workshop on human evaluation in NLP. We have built an international network of 20 research teams currently working on human evaluation who will actively contribute to this project (see Track Record section), making combined contributions in kind of over £80,000. This pre-project activity has created an advantageous starting position for the proposed work, and means we can 'hit the ground running' with the scientifically interesting core of the work.

In this foundational project, our key goals are the development of a methodological framework for testing the reproducibility of human evaluations in NLP, and of a multi-lab paradigm for carrying out such tests in practice, carrying out the first study of this kind in NLP. We will (i) systematically diagnose the extent of the human evaluation reproducibility problem in NLP and survey related current work to address it (WP1); (ii) develop the theoretical and methodological underpinnings for reproducibility testing in NLP (WP2); (iii) test the suitability of the shared-task paradigm (uniformly popular across NLP fields) for reproducibility testing (WP3); (iv) create a design for multi-test reproducibility studies, and run the ReproHum study, an international large-scale multi-lab effort conducting 50+ individual, coordinated reproduction attempts on human evaluations in NLP from the past 10 years (WP4); and (v) nurture and build international consensus regarding how to address the reproducibility crisis, via technical meetings and growing our international network of researchers (WP5).

Key Findings

This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk

Potential use in non-academic contexts

This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk

Impacts

Description	This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Summary
Date Materialised

Sectors submitted by the Researcher

This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk

Project URL:

Further Information:

Organisation Website:

http://www.abdn.ac.uk