Phd defense on 13-10-2025

1 PhD defense from ED Mathématiques et Informatique - 1 PhD defense from ED Sciences de la Vie et de la Santé

13 Oct 2025

Université de Bordeaux

ED Mathématiques et Informatique

Language-guided autonomous deep reinforcement learning agents

by Thomas CARTA (Institut national de recherche en informatique et en automatique - Bordeaux - Sud-Ouest)
The defense will take place at 15h30 - Ada Lovelace Centre Inria de l'université de Bordeaux, 200 Av. de la Vieille Tour, 33405 Talence, France
in front of the jury composed of
- Pierre-Yves OUDEYER - Directeur de recherche - Université de Bordeaux - Directeur de these
- Olivier SIGAUD - Professeur - Sorbonne université - CoDirecteur de these
- Martha WHITE - Associate Professor - University of Alberta - Examinateur
- Edward HUGHES - Docteur - The London School of economics - Examinateur
- Jean PONCE - Professeur - Ecole normale supérieure-PSL - Examinateur
- Georg MARTIUS - Professeur - Université de Tübingen - Rapporteur
- Pierre-Luc BACON - Associate Professor - University of Montreal's DIRO - Rapporteur
Summary
Humans have long invented tools to overcome physical limitations and extend their capabilities—from stone tools to mechanical devices. This innovation eventually extended into the cognitive domain with early calculators and programmable computers. Today, artificial intelligence, especially large language models (LLMs), represents the latest stage in this trajectory, enabling humans to augment cognitive functions such as reasoning, creativity, and discovery. Despite notable achievements in AI, such as fluent language generation and scientific prediction, current systems largely operate under human-defined objectives and strategies. True Artificial General Intelligence (AGI) remains a long-term goal—one where agents can operate in open-ended environments with unspecified goals and no pre-defined solution paths. A key challenge in reaching AGI lies in building open-ended agents—systems that can explore autonomously, learn continuously, and develop increasingly complex skills over time. These agents must not only gather new knowledge from their environment but also reinterpret and build upon past discoveries, much like humans do across the lifespan. Crucially, humans excel in self-generating goals based on intrinsic motivations, such as curiosity and learning progress—capabilities that artificial agents must emulate to achieve genuine open-endedness. Recent work has begun to formally define what open-endedness in agents entails, emphasizing the production of novel, learnable outputs. Language plays a central role here—not only as a communication medium but as a powerful tool for abstract reasoning, generalization, and structured thought. Rooted in developmental theories, language can support exploration, goal generation, and planning in artificial agents. To function effectively, an open-ended agent must operate in both the environment and a goal space. It must explore its surroundings with intrinsic motivation, adapt to evolving dynamics, and efficiently acquire new skills even as tasks grow more complex. This requires continual improvement in learning mechanisms to avoid stagnation and maintain progress. Thesis Structure This thesis explores how language can support the core competencies of open-ended agents through four parts: Part I: Foundations introduces key concepts, including reinforcement learning, intrinsic motivation, and the capabilities of LLMs. It also surveys relevant literature along three dimensions: environment modeling, goal exploration, and efficient learning. Part II: Modeling the Environment focuses on how agents can understand and simulate their surroundings. One method uses an LLM as an embodied reinforcement learner, grounding its understanding through interaction. Another uses the LLM to hypothesize and refine models based on environmental feedback. Part III: Exploring the Goal Space examines how agents can interpret and generate linguistically stated goals. One approach evaluates an agent's competence using language-encoded goals, leveraging linguistic patterns for curriculum learning. Another proposes a goal-generation mechanism guided by the agent's learning progress. Part IV: Learning Efficiently targets improvements in how agents learn policies. Techniques include language-based reward shaping using fill-in-the-blank prompts and hierarchical reinforcement learning where an LLM directs low-level skill use. These strategies enable scalable and reusable learning under computational constraints. Conclusion This thesis argues that language, when treated as a cognitive and structural tool, enables key abilities for open-ended learning. By embedding LLMs into the design of intelligent agents, we can advance toward systems capable of autonomous, lifelong discovery—agents that mirror the adaptive, curious, and creative nature of human intelligence.

ED Sciences de la Vie et de la Santé

Impact of DDR1a and DDR1b isoforms on the development of clear cell Renal Cell Carcinoma (ccRCC)

by Chloé REDOUTE (BoRdeaux Institute of onCology)
The defense will take place at 9h00 - Amphi RDC bâtiment BBS Bâtiment Bordeaux Biologie Santé 2 Rue Dr Hoffmann Martinot 33000 Bordeaux
in front of the jury composed of
- Ulrich VALCOURT - Professeur des universités - Laboratoire de Biologie Tissulaire et Ingénierie Thérapeutique (LBTI) UMR 5305 CNRS et Université Claude Bernard Lyon 1 - Rapporteur
- Curzio RUEGG - Professeur émérite - University of Fribourg, Switzerland - Rapporteur
- Elisabeth GENOT - Directrice de recherche - Bioingénierie Tissulaire (BioTis) Inserm U1026 Université Bordeaux - Examinateur
- Hamid MORJANI - Professeur - BioSpectroscopie Translationnelle BioSpecT - EA7506 UFR de Pharmacie - Université de Reims - Examinateur
- Isabelle SAGOT - Directrice de recherche - IBGC UMR 5095 - Université de Bordeaux - Examinateur
Summary
Renal cell carcinomas (RCC) represent approximately 90% of kidney cancers, with clear cell carcinoma (ccRCC) being the most frequent subtype (75% of cases). Current treatments for ccRCC rely on combinations of tyrosine kinase inhibitors (TKI) and immune checkpoint inhibitors (anti-PD-1/PD-L1/CTLA-4). Despite these therapies, some patients in remission may present with late relapses, sometimes several decades after the end of treatment. This phenomenon may be attributed to the presence of dormant (quiescent) tumor cells within the target organs of metastases. Among the many targets of TKIs is the DDR1 receptor (Discoidin Domain Receptor 1), whose ligands are fibrillar collagens. Five isoforms of DDR1 have been described (a–e), but only DDR1a and DDR1b are expressed in ccRCC. The DDR1b isoform differs from DDR1a by an additional sequence of 37 amino acids among which are two tyrosines: tyrosine 513 (Y513) and tyrosine 520 (Y520). Our analyses of TCGA data showed that high expression of DDR1 is correlated with better survival in patients with ccRCC, suggesting a potential protective role through the inhibition of tumor progression. In order to explore the functional role of DDR1 and its isoforms, the 786-O cell line was modified to separately overexpress DDR1a or DDR1b, as well as mutated versions of DDR1b at Y513, Y520, or both simultaneously. Our in vitro studies revealed that in the presence of collagen I, only the DDR1a isoform decreases proliferation, migration, and cellular invasion. In addition, an enrichment of cells in the G0 phase of the cell cycle, accompanied by nuclear accumulation of p27, was observed, suggesting entry into quiescence possibly linked to tumor cell dormancy or a state of senescence. Both Y513 and Y520 participate in the phenotypes induced by DDR1b, and only the double mutant of DDR1b mimics the DDR1a phenotype: reduction of pro-tumoral properties, enrichment in G0 phase, and increase in nuclear p27. The objectives of this project are to identify the intracellular signaling pathways involved in the quiescent state induced by DDR1a, to discriminate between reversible quiescence (dormancy) or irreversible (senescence), and to determine the importance of tyrosines 513 and/or 520 in DDR1b signaling pathways allowing the reversal of quiescence. In order to identify the intracellular signaling pathways involved in the quiescent state, proteomic and kinomic approaches were carried out on cells expressing DDR1a and DDR1b. The FUCCI reporter system is used in vitro to characterize the ability of cells to re-enter the cell cycle, a discriminating criterion between dormancy and senescence. In order to demonstrate the importance of tyrosines 513 and 520 of DDR1b in the signaling pathways allowing the reversal of quiescence, most of the in vitro experiments have been or will be carried out with the different DDR1b mutant cells at Y513, 520, or both. This project will demonstrate the role and importance of the different DDR1 isoforms in the development of ccRCC, in the quiescent state of ccRCC cells and in their reactivation to form metastases.

Phd defense on 13-10-2025

ED Mathématiques et Informatique

Language-guided autonomous deep reinforcement learning agents

ED Sciences de la Vie et de la Santé

Impact of DDR1a and DDR1b isoforms on the development of clear cell Renal Cell Carcinoma (ccRCC)