Go to content
EN

Phd defense on 24-09-2024

1 PhD defense from ED Sociétés, Politique, Santé Publique

Université de Bordeaux

ED Sociétés, Politique, Santé Publique

  • Clustering and differential analysis of gene expression data

    by Benjamin HIVERT (Bordeaux Population Health Research Center)

    The defense will take place at 14h00 - Amphithéâtre Louis (ISPED) Université de Bordeaux 146 Rue Léo Saignat 33000 Bordeaux

    in front of the jury composed of

    • Rodolphe THIEBAUT - Professeur des universités - praticien hospitalier - UNIVERSITE DE BORDEAUX - Directeur de these
    • Franck PICARD - Directeur de recherche - LABORATOIRE DE BIOLOGIE ET MODÉLISATION DE LA CELLULE - ENS DE LYON - Rapporteur
    • Cathy MAUGIS-RABUSSEAU - Maîtresse de conférences - INSTITUT DE MATHÉMATIQUES DE TOULOUSE - Rapporteur
    • Boris HEJBLUM - Chargé de recherche - INSERM U1219 - BORDEAUX POPULATION HEALTH - CoDirecteur de these
    • Pierre NEUVIAL - Directeur de recherche - INSTITUT DE MATHÉMATIQUES DE TOULOUSE - Examinateur
    • Cécile PROUST-LIMA - Directrice de recherche - INSERM U1219 - BORDEAUX POPULATION HEALTH - Examinateur

    Summary

    Analyses of gene expression data obtained from bulk RNA sequencing (bulk RNA-seq) or single-cell RNA sequencing (scRNA-seq) have become commonplace in immunological studies. They allow for a better understanding of the heterogeneity present in immune responses, whether in reaction to vaccination or disease. Typically, the analysis of these data is conducted in two steps: i) first, an unsupervised classification, or clustering, is performed using all the genes to group samples into distinct and homogeneous subgroups; ii) then, differential analysis is conducted using hypothesis tests to identify genes that are differentially expressed between these subgroups. However, these two successive steps lead to methodological challenge that is often overlooked in the applied literature. Traditional inference methods require hypothesis to be fixed a priori and independent of the data to ensure effective control of type I error. In the context of these two-steps analyses, the hypothesis tests are based on the results of the clustering, which compromises the control of type I error by traditional methods and can lead to false discoveries. We propose new statistical methods that account for this double use of the data and ensure an effective control of the number of false discoveries.