Invited Speakers

Invited Speaker 1: Sandro Pezzelle

Title: From semantic understanding to human-like communication: Implicit and underspecified language as a testbed for large language models

Abstract: The language we use in everyday communicative contexts exhibits a variety of phenomena—such as ambiguity, missing information, or semantic features expressed only indirectly—that make it often implicit or underspecified. Despite this, people are good at understanding and interpreting it. This is possible because we can exploit additional information from the linguistic or extralinguistic context and shared or prior knowledge. Given the ubiquity of these phenomena, NLP models must handle them appropriately to communicate effectively with users and avoid biased behavior, that can be potentially harmful. In this talk, I will present recent work from my group investigating how state-of-the-art transformer large language models (LLMs) handle these phenomena. In particular, I will focus on the understanding of sentences with atypical animacy (“a peanut fell in love”) and on the interpretation of sentences that are ambiguous (“Bob looked at Sam holding a yellow bag”) or where some information is missing or implicit (“don’t spend too much”). I will show that, in some cases, LMs behave surprisingly similarly to speakers; in other cases, they fail quite spectacularly. I will argue that having access to multimodal information (e.g., from language and vision) should, in principle, give these models an advantage on these semantic phenomena—as long as we take a perspective aware of the communicative aspects of language use.

Bio: Sandro Pezzelle (https://sandropezzelle.github.io/) studies human-like natural language understanding and generation in text-only large language models (LLMs) and their multimodal versions combining language-and-vision  (VLMs). As such, his work combines methods and insights from Natural Language Processing, Computer Vision, and Cognitive Science. His current research interests span LLM and VLM evaluation and interpretability inspired by human cognition, how the learning of semantic and pragmatic abilities compares in humans and machines, and whether (and how) the cognitive mechanisms underlying human language communication can be used to develop better language models. He co-authored articles in top-tier conferences (ACL, EMNLP, EACL, NAACL, CoLM) and journals (TACL, Cognition, Cognitive Science). He is a member of the ELLIS society, a faculty member of the ELLIS Amsterdam Unit, and a board member of SigSem, the ACL special interest group in computational semantics. In 2024, he organized the UnImplicit workshop at EACL 2024 on understanding implicit and underspecified language.

Invited Speaker 2: Frank Keller

Title: Predicting Human Sentence Production across Modalities and Languages
Joint work with Moreno I. Coco, Eunice G. Fernandes, and Manabu Arai

Slides: you can get the PDF here.

Abstract: Human cognition is a highly integrated system which synchronizes processes and representations across modalities. Previous research on the synchronization between attention and sentence production demonstrated that similar scene descriptions correspond to similar sequences of attended objects (scan patterns). In this work, we investigate whether this finding generalizes from English to languages with different word order. We test whether synchronicity holds not just within a language but across languages and examine the relative contribution of syntax and semantics. Three groups of participants speaking English, Portuguese, or Japanese described objects situated in a visual scene, while being eye-tracked. Across all participants, pair-wise sentence similarity was computed using Universal Sentence Encoder, which generates multilingual vector-based meaning representations. Part-of-Speech (PoS) sequences were assigned to the produced sentences and similarities between PoS sequences and scan patterns were measured using Longest Common Subsequence. We found that similar sentences are associated with similar scan patterns in all three languages. Moreover, we demonstrated for the first time that this relationship holds across languages (e.g., if a Japanese and a Portuguese sentence are semantically similar, their associated scan patterns are also similar). In contrast, we find that syntactic similarity (i.e., PoS similarity) is predicted by scan patterns only within the same scene and only between languages with similar word order. This confirms that visual attention and language production are synchronized across language and modalities and points to a grammar of perception that is language-independent, goes beyond syntactic realizations, and manifests as oculomotor responses such as eye-movements.

Bio: Frank Keller is a professor in the School of Informatics at the University of Edinburgh. He has also held visiting positions at MIT and the University of Washington. His research area is natural language processing, with a particular focus on language and vision tasks, such as image description, visual grounding, video summarization, and visual story telling. His second main research interest is computational narrative, where he works on modeling key narrative concepts such as characters, plot turning points, and suspense. This involves understanding or generating long-form texts such as movie scripts or books, which is challenging for LLMs.

Prof. Keller is part of the leadership team of the UKRI Centre for Doctoral Training in Natural Language Processing; he serves on the editorial board of the Transactions of the ACL, and he is an ELLIS fellow. In the past, he has held an ERC grant in the area of language and vision. Frank Keller (https://homepages.inf.ed.ac.uk/keller/) is a professor in the School of Informatics at the University of Edinburgh.

Invited Speaker 3: Aida Nematzadeh

Title: Leveraging Cognitive Science to Unravel the Complexities of Generative Models

Slides: you can get the PDF here

Abstract: Recent generative models have demonstrated remarkable capabilities, from solving intricate reasoning problems to creating highly realistic images. However, as these models grow more complex, evaluating them presents increasing challenges—particularly since we often have access only to their outputs, not the underlying mechanisms. This predicament mirrors a challenge faced by cognitive scientists: understanding human cognition by observing behavior without direct access to the “cognitive model” itself. In this talk, I will explore how principles from cognitive science can illuminate the evaluation of generative models. I will discuss how cognitive science approaches, such as experimental design in human data collection, probing for specific capabilities, and developing automated evaluation metrics, can offer valuable insights into understanding and assessing these advanced models.

Bio: Aida Nematzadeh (http://www.aidanematzadeh.me/) is a Research Scientist at DeepMind, where she explores the intersection of computational linguistics, cognitive science, and machine learning. Her recent work focuses on multimodal learning, as well as the evaluation and analysis of neural representations. Prior to joining DeepMind, Aida was a postdoctoral researcher at UC Berkeley. She holds a PhD and an MSc in Computer Science from the University of Toronto.