Home Categories About Call for Papers

HumEval 2024 : The Fourth Workshop on Human Evaluation of NLP Systems

Turin (Italy)

Event Date:	May 21, 2024 - May 21, 2024
Submission Deadline:	March 11, 2024
Notification of Acceptance:	April 04, 2024
Camera Ready Version Due:	April 19, 2024

Call for Papers

The Fourth Workshop on Human Evaluation of NLP Systems (HumEval 2024) invites the submission of long and short papers on current human evaluation research and future directions. HumEval 2024 will take place in Turin (Italy) on May 21 2024, during LREC-COLING 2024.

Website: https://humeval.github.io/

Important dates:
Submission deadline: 11 March 2024
Paper acceptance notification: 4 April 2024
Camera-ready versions: 19 April 2024
HumEval 2024: 21 May 2024
LREC-COLING 2024 conference: 20–25 May 2024

All deadlines are 23:59 UTC-12.

===============================================

Human evaluation plays a central role in NLP, from the large-scale crowd-sourced evaluations carried out e.g. by the WMT workshops, to the much smaller experiments routinely encountered in conference papers. Moreover, while NLP embraced a number of automatic evaluation metrics, the field has always been acutely aware of their limitations (Callison-Burch et al., 2006; Reiter and Belz, 2009; Novikova et al., 2017; Reiter, 2018; Mathur et al., 2020a), and has gauged their trustworthiness in terms of how well, and how consistently, they correlate with human evaluation scores (Gatt and Belz, 2008; Popović and Ney, 2011., Shimorina, 2018; Mille et al., 2019; Dušek et al., 2020, Mathur et al., 2020b). Yet there is growing unease about how human evaluations are conducted in NLP. Researchers have pointed out the less than perfect experimental and reporting standards that prevail (van der Lee et al., 2019; Gehrmann et al., 2023), and that low-quality evaluations with crowdworkers may not correlate well with high-quality evaluations with domain experts (Freitag et al., 2021). Only a small proportion of papers provide enough detail for reproduction of human evaluations, and in many cases the information provided is not even enough to support the conclusions drawn (Belz et al., 2023). We have found that more than 200 different quality criteria (such as Fluency, Accuracy, Readability, etc.) have been used in NLP, and that different papers use the same quality criterion name with different definitions, and the same definition with different names (Howcroft et al., 2020). Furthermore, many papers do not use a named criterion, asking the evaluators only to assess 'how good' the output is. Inter and intra-annotator agreement are usually given only in the form of an overall number without analysing the reasons and causes for disagreement and potential to reduce them. A small number of papers have aimed to address this from different perspectives, e.g. comparing agreement for different evaluation methods (Belz and Kow, 2010), or analysing errors and linguistic phenomena related to disagreement (Pavlick and Kwiatkowski, 2019; Oortwijn et al., 2021; Thomson and Reiter, 2020; Popović, 2021). Context beyond sentences needed for a reliable evaluation has also started to be investigated (e.g. Castilho et al., 2020). The above aspects all interact in different ways with the reliability and reproducibility of human evaluation measures. While reproducibility of automatically computed evaluation measures has attracted attention for a number of years (e.g. Pineau et al., 2018, Branco et al., 2020), research on reproducibility of measures involving human evaluations is a more recent addition (Cooper & Shardlow, 2020; Belz et al., 2023).

The HumEval workshops (previously at EACL 2021, ACL 2022, and RANLP 2023) aim to create a forum for current human evaluation research and future directions, a space for researchers working with human evaluations to exchange ideas and begin to address the issues human evaluation in NLP faces in many respects, including experimental design, meta-evaluation and reproducibility. We will invite papers on topics including, but not limited to, the following topics as addressed in any subfield of NLP

- Experimental design and methods for human evaluations
- Reproducibility of human evaluations
- Inter-evaluator and intra-evaluator agreement
- Ethical considerations in human evaluation of computational systems
- Quality assurance for human evaluation
- Crowdsourcing for human evaluation
- Issues in meta-evaluation of automatic metrics by correlation with human evaluations
- Alternative forms of meta-evaluation and validation of human evaluations
- Comparability of different human evaluations
- Methods for assessing the quality and the reliability of human evaluations
- Role of human evaluation in the context of Responsible and Accountable AI

Submissions for both short and long papers will be made directly via START, following submission guidelines issued by LREC-COLING 2024. For full submission details please refer to the workshop website.

The third ReproNLP Shared Task on Reproduction of Automatic and Human Evaluations of NLP Systems will be part of HumEval, offering (A) an Open Track for any reproduction studies involving human evaluation of NLP systems; and (B) the ReproHum Track where participants will reproduce the papers currently being reproduced by partner labs in the EPSRC ReproHum project. A separate call will be issued for ReproNLP 2024.

Summary

HumEval 2024 : The Fourth Workshop on Human Evaluation of NLP Systems will take place in Turin (Italy). It’s a 1 day event starting on May 21, 2024 (Tuesday) and will be winded up on May 21, 2024 (Tuesday).

HumEval 2024 falls under the following areas: NLP, COMPUTATIONAL LINGUISTICS, ARTIFICIAL INTELLIGENE, etc. Submissions for this Workshop can be made by Mar 11, 2024. Authors can expect the result of submission by Apr 4, 2024. Upon acceptance, authors should submit the final version of the manuscript on or before Apr 19, 2024 to the official website of the Workshop.

Please check the official event website for possible changes before you make any travelling arrangements. Generally, events are strict with their deadlines. It is advisable to check the official website for all the deadlines.

Other Details of the HumEval 2024

Short Name: HumEval 2024
Full Name: The Fourth Workshop on Human Evaluation of NLP Systems
Timing: 09:00 AM-06:00 PM (expected)
Fees: Check the official website of HumEval 2024
Event Type: Workshop
Website Link: https://humeval.github.io/
Location/Address: Turin (Italy)

Credits and Sources

[1] HumEval 2024 : The Fourth Workshop on Human Evaluation of NLP Systems

HumEval 2024 : The Fourth Workshop on Human Evaluation of NLP Systems

Categories

Call for Papers

Summary

Other Details of the HumEval 2024

Credits and Sources

OTHER NLP EVENTS

OTHER COMPUTATIONAL LINGUISTICS EVENTS

OTHER ARTIFICIAL INTELLIGENE EVENTS