ED Mathématiques et Informatique
Grounding LLMS as autotelic reinforcement learning agents
by Clément ROMAC (Institut national de recherche en informatique et en automatique - Bordeaux - Sud-Ouest)
The defense will take place at 15h00 - Ada Lovelace Centre Inria de l'Université de Bordeaux - 200 Av. de la Vieille Tour, 33405 Talence
in front of the jury composed of
- Pierre-Yves OUDEYER - Directeur de recherche - Centre Inria de l'Université de Bordeaux - Directeur de these
- Ellie PAVLICK - Associate Professor - Brown University - Rapporteur
- Prithviraj AMMANABROLU - Assistant professor - University of California, San Diego - Rapporteur
- Hugo LAROCHELLE - Professeur - Université de Montréal - Examinateur
- Matthieu CORD - Professeur - Sorbonne Université - Examinateur
- Thomas WOLF - Cadre scientifique - Hugging Face - Examinateur
Building machines capable of processing and understanding natural language has long been a central goal of artificial intelligence (AI). In recent years, distributional approaches based on deep neural networks have dominated the field. In particular, large language models (LLMs)—trained to generate text by imitating large internet corpora—have driven remarkable progress. However, their purely statistical learning paradigm faces growing criticism. In contrast, developmental sciences emphasize that human language acquisition is grounded in sensorimotor experience and social interaction. Humans acquire language by interacting with the physical and social world and use it in functional, goal-directed ways—driven by intrinsic motivation and curiosity. This thesis explores how to bridge the gap between LLMs and developmental theories of language acquisition by embodying LLMs as curiosity-driven reinforcement learning (RL) agents capable of learning from interaction. We begin by introducing the concept of functional grounding: aligning an agent's internal representations with the external environment to enable prediction, control, and goal achievement. To this end, we propose GLAM, an online RL-based approach that trains LLMs through interaction with a textual environment. We show that GLAM significantly improves functional competence—that is, the model's ability to use language effectively to achieve goals. A follow-up analysis identifies limitations (e.g., sensitivity to prompt format) and shows that combining grounding with varied contexts and contrastive learning increases robustness. To enhance predictive capabilities, we also introduce WorldLLM, a framework where LLMs generate and refine natural language theories through curiosity-driven interaction, improving their world modeling abilities. We then explore functional grounding in more complex environments involving an open-ended set of goals. Inspired by human development, we adopt an autotelic learning approach—where agents generate, choose, and pursue their own goals. We first address the challenge of sample efficiency with SAC-GLAM, an extension of GLAM that incorporates off-policy RL and hindsight relabeling to better learn from sparse or noisy rewards. Next, we tackle goal selection by extending Learning Progress (LP) methods to language-defined goals. We introduce MAGELLAN, a metacognitive module that enables LLMs to estimate their competence and prioritize goals that maximize LP. MAGELLAN structures exploration by capturing semantic and dynamic relationships between goals. We show that metacognitive abilities help LLMs not only decide what to learn next but also recognize when help is needed and seek external assistance. Altogether, this research demonstrates that embodying LLMs as autotelic agents opens new avenues for creating grounded, adaptive, and self-improving language models. It offers concrete steps toward building LLMs that not only use language functionally but also learn to model the world and themselves—through metacognitive capabilities. Yet, much remains to be done, and achieving functional grounding in our physical and social world remains a major challenge for the next generation of intelligent systems.