Towards minimal cells and beyond, the development and application of bioinformatic tools for large scale genomic data analysis of endosymbiotic bacteria of insects
Over the last couple of decades, symbioses between insects and bacterial endosymbionts have been the focus of remarkable empirical studies. Models of symbiosis between specific bacterial lineages and their hosts have been described, but to our knowledge, no large-scale analyses have been done in order to begin deciphering the overall evolutionary path of the endosymbiosis phenomenon. Insects represent about 85% of animal diversity, and about 60% maintain symbiotic relationships with microbes, which mainly allow their hosts to live in niches otherwise unavailable to them by providing them with nutrients, protection, and even new forms of energy. Bacterial endosymbionts often live within specialized cells in insects called bacteriocytes; they generally have a base compositional bias towards A+T in their genomes, undergo genomic shrinkage, and have an accelerated sequence evolution, all of which are convergent attributes with organelles resembling their extended and combined evolutionary histories; so, in this work, we have proposed the term "symbionelles" for long-term obligate bacterial endosymbionts of insects. Most of these organisms have the smallest genomes found in nature. This makes them good models for studying minimal cells through genomic and metabolomic analyses, which is the subject of two chapters of this thesis. Recent changes in technology have made it necessary to look for new and creative ways to handle and process large amounts of data. With completely sequenced and annotated genomes from endosymbiotic bacteria of insects, databases are indispensable tools for organizing and easily accessing specific biological information. We constructed and published a composite database that includes the genomic data of symbiotic relationships between bacteria and insects, as well as all symbiotic relationships found in primary databases since the process for making this database was the same for both. Our database includes the confirmation of the symbiotic relationships (validated in literature), the associated publication (original journal article link where the association was first described), the organization and availability of the sequences of all genes, genomes, and orthologs of each prokaryotic symbiont, and the metabolic network of all organisms included in this repository as an m-DAG, a new reaction-based methodology applied on genomic data to create contracted metabolic diagrams by connecting directed acyclic graphs and creating metabolic building blocks as nodes of a metabolic network. By comparing these reaction-based metabolic networks to standard metabolite-based ones, we were able to look at the differences and similarities between organisms that have evolved in different ways, such as being more, or less involved in endosymbiosis.