[hal-04380551] Consistency and asymptotic normality of stochastic block models estimators from sampled data

3 months 1 week ago
Statistical analysis of network is an active research area and the literature counts a lot of papers concerned with network models and statistical analysis of networks. However, very few papers deal with missing data in network analysis and we reckon that, in practice, networks are often observed with missing values. In this paper we focus on the Stochastic Block Model with valued edges and consider a MCAR setting by assuming that every dyad (pair of nodes) is sampled identically and independently of the others with probability ρ > 0. We prove that maximum likelihood estimators and its variational approximations are consistent and asymptotically normal in the presence of missing data as soon as the sampling probability ρ satisfies ρ log(n)/n.
Mahendra Mariadassou

[hal-02903118] Assessing the quality of fresh Whitemouth croaker (Micropogonias furnieri) meat based on micro‐organism and histamine analysis using NGS, qPCR and HPLC‐DAD

4 months 4 weeks ago
Aims Quality evaluation of fresh whitemouth croaker (Micropogonias furnieri ) by histamine determination using the HPLC‐DAD method and quantification of histamine‐forming bacteria using NGS and qPCR. Methods and Results The histamine content of fresh whitemouth croaker was detected by high performance liquid chromatography with diode array detector with a concentration ranging from 258·52 to 604·62 mg kg−1 being observed. The number of histidine decarboxylase (hdc gene) copies from Gram‐negative bacteria and the bacteria Morganella morganii and Enterobacter aerogenes were quantified by quantitative polymerase chain reaction. All samples were positive, with copy numbers of the hdc gene ranging from 4·67 to 12·01 log10 per g. The microbial community was determined by sequencing the V4 region of the 16S rRNA gene using the Ion Torrent platform. The bioinformatics data generated by frog software showed that the phylum Proteobacteria was the most abundant, with the family Moraxellaceae being more prevalent in samples collected in the summer, whereas the Pseudomonadaceae was more present in the winter. Conclusions All fish muscle samples analysed in this study presented histamine values higher than those allowed by CODEX Alimentarius. Additionally, a wide variety of spoilage micro‐organisms capable of expressing the enzyme histidine decarboxylase were detected. Thus, improvements in handling and processing are required to minimize the prevalence of histamine‐producing bacteria in fish. Significance and Impact of the Study Global fish production in 2016 was 171 million tons, with the largest consumer being China, followed by Indonesia and the USA. In Brazil, 1·3 million tons of fish are consumed per year, with whitemouth croaker being the main fish landed. Notably, cases associated with histamine poisoning are quite common. According to the European Food Safety Authority and European Centre for Disease Prevention and Control, a total of 599 HFP outbreaks were identified in the European Union during the period 2010–2017. In the USA, there were 333 outbreaks with 1383 people involved between 1998 and 2008.
Alessandra Danile de Lira

[hal-03516147] Lactococcus lactis Diversity Revealed by Targeted Amplicon Sequencing of purR Gene, Metabolic Comparisons and Antimicrobial Properties in an Undefined Mixed Starter Culture Used for Soft-Cheese Manufacture

1 year 7 months ago
The undefined mixed starter culture (UMSC) is used in the manufacture of cheeses. Deciphering UMSC microbial diversity is important to optimize industrial processes. The UMSC was studied using culture-dependent and culture-independent based methods. MALDI-TOF MS enabled identification of species primarily from the Lactococcus genus. Comparisons of carbohydrate metabolism profiles allowed to discriminate five phenotypes of Lactococcus (n = 26/1616). The 16S sequences analysis (V1–V3, V3–V4 regions) clustered the UMSC microbial diversity into two Lactococcus operational taxonomic units (OTUs). These clustering results were improved with the DADA2 algorithm on the housekeeping purR sequences. Five L. lactis variants were detected among the UMSC. The whole-genome sequencing of six isolates allowed for the identification of the lactis subspecies using Illumina® (n = 5) and Pacbio® (n = 1) technologies. Kegg analysis confirmed the L. lactis species-specific niche adaptations and highlighted a progressive gene pseudogenization. Then, agar spot tests and agar well diffusion assays were used to assess UMSC antimicrobial activities. Of note, isolate supernatants (n = 34/1616) were shown to inhibit the growth of Salmonella ser. Typhimurium CIP 104115, Lactobacillus sakei CIP 104494, Staphylococcus aureus DSMZ 13661, Enterococcus faecalis CIP103015 and Listeria innocua CIP 80.11. Collectively, these results provide insightful information about UMSC L. lactis diversity and revealed a potential application as a bio-protective starter culture.
Sabrina Saltaji

[hal-03516147] Lactococcus lactis Diversity Revealed by Targeted Amplicon Sequencing of purR Gene, Metabolic Comparisons and Antimicrobial Properties in an Undefined Mixed Starter Culture Used for Soft-Cheese Manufacture

1 year 7 months ago
The undefined mixed starter culture (UMSC) is used in the manufacture of cheeses. Deciphering UMSC microbial diversity is important to optimize industrial processes. The UMSC was studied using culture-dependent and culture-independent based methods. MALDI-TOF MS enabled identification of species primarily from the Lactococcus genus. Comparisons of carbohydrate metabolism profiles allowed to discriminate five phenotypes of Lactococcus (n = 26/1616). The 16S sequences analysis (V1–V3, V3–V4 regions) clustered the UMSC microbial diversity into two Lactococcus operational taxonomic units (OTUs). These clustering results were improved with the DADA2 algorithm on the housekeeping purR sequences. Five L. lactis variants were detected among the UMSC. The whole-genome sequencing of six isolates allowed for the identification of the lactis subspecies using Illumina® (n = 5) and Pacbio® (n = 1) technologies. Kegg analysis confirmed the L. lactis species-specific niche adaptations and highlighted a progressive gene pseudogenization. Then, agar spot tests and agar well diffusion assays were used to assess UMSC antimicrobial activities. Of note, isolate supernatants (n = 34/1616) were shown to inhibit the growth of Salmonella ser. Typhimurium CIP 104115, Lactobacillus sakei CIP 104494, Staphylococcus aureus DSMZ 13661, Enterococcus faecalis CIP103015 and Listeria innocua CIP 80.11. Collectively, these results provide insightful information about UMSC L. lactis diversity and revealed a potential application as a bio-protective starter culture.
Sabrina Saltaji

[hal-02907466] Integrating independent microbial studies to build predictive models of anaerobic digestion inhibition by ammonia and phenol

1 year 7 months ago
Anaerobic digestion (AD) is a microbial process that can efficiently degrade organic waste into renewable energies such as methane-rich biogas. However, the underpinning microbial mechanisms are highly vulnerable to a wide range of inhibitory compounds, leading to process failure and economic losses. High-throughput sequencing technologies enable the identification of microbial indicators of digesters inhibition and can provide new insights into the key phylotypes at stake during AD process. But yet, current studies have used different inocula, substrates, geographical sites and types of reactors, resulting in indicators that are not robust or reproducible across independent studies. In addition, such studies focus on the identification of a single microbial indicator that is not reflective of the complexity of AD. Our study proposes the first analysis of its kind that seeks for a robust signature of microbial indicators of phenol and ammonia inhibitions, whilst leveraging on 4 independent in-house and external AD microbial studies. We applied a recent multivariate integrative method on two-in-house studies to identify such signature, then predicted the inhibitory status of samples from two datasets with more than 90% accuracy. Our study demonstrates how we can efficiently analyze existing studies to extract robust microbial community patterns, predict AD inhibition, and deepen our understanding of AD towards better AD microbial management.
Simon Poirier

[hal-02633276] Lactococcus lactis Diversity Revealed by Targeted Amplicon Sequencing of purR Gene, Metabolic Comparisons and Antimicrobial Properties in an Undefined Mixed Starter Culture Used for Soft-Cheese Manufacture

1 year 8 months ago
The undefined mixed starter culture (UMSC) is used in the manufacture of cheeses. Deciphering UMSC microbial diversity is important to optimize industrial processes. The UMSC was studied using culture-dependent and culture-independent based methods. MALDI-TOF MS enabled identification of species primarily from the Lactococcus genus. Comparisons of carbohydrate metabolism profiles allowed to discriminate five phenotypes of Lactococcus (n = 26/1616). The 16S sequences analysis (V1-V3, V3-V4 regions) clustered the UMSC microbial diversity into two Lactococcus operational taxonomic units (OTUs). These clustering results were improved with the DADA2 algorithm on the housekeeping purR sequences. Five L. lactis variants were detected among the UMSC. The whole-genome sequencing of six isolates allowed for the identification of the lactis subspecies using Illumina® (n = 5) and Pacbio® (n = 1) technologies. Kegg analysis confirmed the L. lactis species-specific niche adaptations and highlighted a progressive gene pseudogenization. Then, agar spot tests and agar well diffusion assays were used to assess UMSC antimicrobial activities. Of note, isolate supernatants (n = 34/1616) were shown to inhibit the growth of Salmonella ser. Typhimurium CIP 104115, Lactobacillus sakei CIP 104494, Staphylococcus aureus DSMZ 13661, Enterococcus faecalis CIP103015 and Listeria innocua CIP 80.11. Collectively, these results provide insightful information about UMSC L. lactis diversity and revealed a potential application as a bio-protective starter culture.
Sabrina Saltaji

[hal-03285327] Detection of selection signatures in Limousin cattle using whole‐genome resequencing

2 years 8 months ago
Limousin, a renowned beef breed originating from central France, has been selectively bred over the last 100 years to improve economically important traits. We used whole-genome sequencing data from 10 unrelated Limousin bull calves to detect polymorphisms and identify regions under selection. A total of 13 943 766 variants were identified. Moreover, 311 852 bi-allelic SNPs and 92 229 indels located on autosomes were fixed for the alternative allele in all sequenced animals, including the previously reported missense deleterious F94L mutation inMSTN. We performed a whole-genome screen to discover genomic regions with excess homozygosity, using the pooled heterozygosity score and identified 171 different candidate selective sweeps. In total, 68 candidate genes were found in only 57 of these regions, indicating that a large fraction of the genome under selection might lie in non-coding regions and suggesting that a majority of adaptive mutations might be regulatory in nature. Many QTL were found within candidate selective sweep regions, including QTL associated with shear force or carcass weight. Among the putative selective sweeps, we located genes (MSTN,NCKAP5,RUNX2) that potentially contribute to important phenotypes in Limousin. Several candidate regions and genes under selection were also found in previous genome-wide selection scans performed in Limousin. In addition, we were able to pinpoint candidate causative regulatory polymorphisms inGRIK3andRUNX2that might have been under selection. Our results will contribute to improved understanding of the mechanisms and targets of artificial selection and will facilitate the interpretation of GWASs performed in Limousin.
M. Mariadassou

[hal-03155990] Unraveling the history of the genus Gallus through whole genome sequencing

2 years 8 months ago
The genus Gallus is distributed across a large part of Southeast Asia and has received special interest because the domestic chicken, Gallus gallus domesticus, has spread all over the world and is a major protein source for humans. There are four species: the red junglefowl (G. gallus), the green junglefowl (G. varius), the Lafayette's junglefowl (G. lafayettii) and the grey junglefowl (G. sonneratii). The aim of this study is to reconstruct the history of these species by a whole genome sequencing approach and resolve inconsistencies between well supported topologies inferred using different data and methods. Using deep sequencing, we identified over 35 million SNPs and reconstructed the phylogeny of the Gallus genus using both distance (BioNJ) and maximum likelihood (ML) methods. We observed discrepancies according to reconstruction methods and genomic components. The two most supported topologies were previously reported and were discriminated by using phylogenetic and gene flow analyses, based on ABBA statistics. Terminology fix requested by the deputy editor led to support a scenario with G. gallus as the earliest branching lineage of the Gallus genus, instead of G. varius. We discuss the probable causes for the discrepancy. A likely one is that G. sonneratii samples from parks or private collections are all recent hybrids, with roughly 10% of their autosomal genome originating from G. gallus. The removal of those regions is needed to provide reliable data, which was not done in previous studies. We took care of this and additionally included two wild G. sonneratii samples from India, showing no trace of introgression. This reinforces the importance of carefully selecting and validating samples and genomic components in phylogenomics.
Mahendra Mariadassou

[hal-03270230] Statistical modelling of bacterial promoter sequences for regulatory motif discovery with the help of transcriptome data: application to Listeria monocytogenes

2 years 9 months ago
Automatic de novo identification of the main regulons of a bacterium from genome and transcriptome data remains a challenge. To address this task, we propose a statistical model that can use information on exact positions of the transcription start sites and condition-dependent expression profiles. The central idea of this model is to improve the probabilistic representation of the promoter DNA sequences by incorporating covariates summarizing expression profiles (e.g. coordinates in projection spaces or hierarchical clustering trees). A dedicated trans-dimensional Markov chain Monte Carlo algorithm adjusts the width and palindromic properties of the corresponding position-weight matrices, the number of parameters to describe exact position relative to the transcription start site, and chooses the expression covariates relevant for each motif. All parameters are estimated simultaneously, for many motifs and many expression covariates. The method is applied to a dataset of transcription start sites and expression profiles available for Listeria monocytogenes . The results validate the approach and provide a new global view of the transcription regulatory network of this important pathogen. Remarkably, a previously unreported motif is found in promoter regions of ribosomal protein genes, suggesting a role in the regulation of growth.
Ibrahim Sultan

[hal-03225018] A Probiotic Mixture Induces Anxiolytic- and Antidepressive-Like Effects in Fischer and Maternally Deprived Long Evans Rats

2 years 11 months ago
A role of the gut microbiota in psychiatric disorders is supported by a growing body of literature. The effects of a probiotic mixture of four bacterial strains were studied in two models of anxiety and depression, naturally stress-sensitive Fischer rats and Long Evans rats subjected to maternal deprivation. Rats chronically received either the probiotic mixture (1.10(9) CFU/day) or the vehicle. Anxiety- and depressive-like behaviors were evaluated in several tests. Brain monoamine levels and gut RNA expression of tight junction proteins (Tjp) and inflammatory markers were quantified. The gut microbiota was analyzed in feces by 16S rRNA gene sequencing. Untargeted metabolite analysis reflecting primary metabolism was performed in the cecal content and in serum. Fischer rats treated with the probiotic mixture manifested a decrease in anxiety-like behaviors, in the immobility time in the forced swimming test, as well as in levels of dopamine and its major metabolites, and those of serotonin metabolites in the hippocampus and striatum. In maternally deprived Long Evans rats treated with the probiotic mixture, the number of entries into the central area in the open-field test was increased, reflecting an anxiolytic effect. The probiotic mixture increased Tjp1 and decreased Ifn gamma mRNA levels in the ileum of maternally deprived rats. In both models, probiotic supplementation changed the proportions of several Operational Taxonomic Units (OTU) in the gut microbiota, and the levels of certain cecal and serum metabolites were correlated with behavioral changes. Chronic administration of the tested probiotic mixture can therefore beneficially affect anxiety- and depressive-like behaviors in rats, possibly owing to changes in the levels of certain metabolites, such as 21-deoxycortisol, and changes in brain monoamines.
Valérie Daugé

[hal-03225018] A Probiotic Mixture Induces Anxiolytic- and Antidepressive-Like Effects in Fischer and Maternally Deprived Long Evans Rats

2 years 11 months ago
A role of the gut microbiota in psychiatric disorders is supported by a growing body of literature. The effects of a probiotic mixture of four bacterial strains were studied in two models of anxiety and depression, naturally stress-sensitive Fischer rats and Long Evans rats subjected to maternal deprivation. Rats chronically received either the probiotic mixture (1.10(9) CFU/day) or the vehicle. Anxiety- and depressive-like behaviors were evaluated in several tests. Brain monoamine levels and gut RNA expression of tight junction proteins (Tjp) and inflammatory markers were quantified. The gut microbiota was analyzed in feces by 16S rRNA gene sequencing. Untargeted metabolite analysis reflecting primary metabolism was performed in the cecal content and in serum. Fischer rats treated with the probiotic mixture manifested a decrease in anxiety-like behaviors, in the immobility time in the forced swimming test, as well as in levels of dopamine and its major metabolites, and those of serotonin metabolites in the hippocampus and striatum. In maternally deprived Long Evans rats treated with the probiotic mixture, the number of entries into the central area in the open-field test was increased, reflecting an anxiolytic effect. The probiotic mixture increased Tjp1 and decreased Ifn gamma mRNA levels in the ileum of maternally deprived rats. In both models, probiotic supplementation changed the proportions of several Operational Taxonomic Units (OTU) in the gut microbiota, and the levels of certain cecal and serum metabolites were correlated with behavioral changes. Chronic administration of the tested probiotic mixture can therefore beneficially affect anxiety- and depressive-like behaviors in rats, possibly owing to changes in the levels of certain metabolites, such as 21-deoxycortisol, and changes in brain monoamines.
Valérie Daugé

[hal-03019865] Function-Driven Design of Lactic Acid Bacteria Co-cultures to Produce New Fermented Food Associating Milk and Lupin

3 years 4 months ago
Designing bacterial co-cultures adapted to ferment mixes of vegetal and animal resources for food diversification and sustainability is becoming a challenge. Among bacteria used in food fermentation, lactic acid bacteria (LAB) are good candidates, as they are used as starter or adjunct in numerous fermented foods, where they allow preservation, enhanced digestibility, and improved flavor. We developed here a strategy to design LAB co-cultures able to ferment a new food made of bovine milk and lupin flour, consisting in: (i) in silico preselection of LAB species for targeted carbohydrate degradation; (ii) in vitro screening of 97 strains of the selected species for their ability to ferment carbohydrates and hydrolyze proteins from milk and lupin and clustering strains that displayed similar phenotypes; and (iii) assembling strains randomly sampled from clusters that showed complementary phenotypes. The designed co-cultures successfully expressed the targeted traits i.e., hydrolyzed proteins and degraded raffinose family oligosaccharides of lupin and lactose of milk in a large range of concentrations. They also reduced an off-flavor-generating volatile, hexanal, and produced various desirable flavor compounds. Most of the strains in co-cultures achieved higher cell counts than in monoculture, suggesting positive interactions. This work opens new avenues for the development of innovative fermented food products based on functionally complementary strains in the worldwide context of diet diversification.
Fanny Canon

[hal-02922962] Abundance, Diversity and Role of ICEs and IMEs in the Adaptation of Streptococcus salivarius to the Environment

3 years 7 months ago
Streptococcus salivarius is a significant contributor to the human oral, pharyngeal and gut microbiomes that contribute to the maintenance of health. The high genomic diversity observed in this species is mainly caused by horizontal gene transfer. This work aimed to evaluate the contribution of integrative and conjugative elements (ICEs) and integrative and mobilizable elements (IMEs) in S. salivarius genome diversity. For this purpose, we performed an in-depth analysis of 75 genomes of S. salivarius and searched for signature genes of conjugative and mobilizable elements. This analysis led to the retrieval of 69 ICEs, 165 IMEs and many decayed elements showing their high prevalence in S. salivarius genomes. The identification of almost all ICE and IME boundaries allowed the identification of the genes in which these elements are inserted. Furthermore, the exhaustive analysis of the adaptation genes carried by these elements showed that they encode numerous functions such as resistance to stress, to antibiotics or to toxic compounds, and numerous enzymes involved in diverse cellular metabolic pathways. These data support the idea that not only ICEs but also IMEs and decayed elements play an important role in S. salivarius adaptation to the environment.
Julie Lao

[hal-02914971] Large-scale multivariate dataset on the characterization of microbiota diversity, microbial growth dynamics, metabolic spoilage volatilome and sensorial profiles of two industrially produced meat products subjected to changes in lactate…

3 years 8 months ago
Data in this article provide detailed information on the diversity of bacterial communities present on 576 samples of raw pork or poultry sausages produced industrially in 2017. Bacterial growth dynamics and diversity were monitored throughout the refrigerated storage period to estimate the impact of packaging atmosphere and the use of potassium lactate as chemical preservative. The data include several types of analysis aiming at providing a comprehensive microbial ecology of spoilage during storage and how the process parameters do influence this phenomenon. The analysis includes: the gas content in packaging, pH, chromametric measurements, plate counts (total mesophilic aerobic flora and lactic acid bacteria), sensorial properties of the products, meta-metabolomic quantification of volatile organic compounds and bacterial community metagenetic analysis. Bacterial diversity was monitored using two types of amplicon sequencing (16S rRNA and GyrB encoding genes) at different time points for the different conditions (576 samples for gyrB and 436 samples for 16S rDNA). Sequencing data were generated by using Illumina MiSeq. The sequencing data have been deposited in the bioproject PRJNA522361. Samples accession numbers vary from SAMN10964863 to SAMN10965438 for gyrB amplicon and from SAMN10970131 to SAMN10970566 for 16S.
Simon Poirier

[hal-02914869] DUGMO: tool for the detection of unknown genetically modified organisms with high-throughput sequencing data for pure bacterial samples

3 years 8 months ago
Background The European Community has adopted very restrictive policies regarding the dissemination and use of genetically modified organisms (GMOs). In fact, a maximum threshold of 0.9% of contaminating GMOs is tolerated for a "GMO-free" label. In recent years, imports of undescribed GMOs have been detected. Their sequences are not described and therefore not detectable by conventional approaches, such as PCR. Results We developed DUGMO, a bioinformatics pipeline for the detection of genetically modified (GM) bacteria, including unknown GM bacteria, based on Illumina paired-end sequencing data. The method is currently focused on the detection of GM bacteria with - possibly partial - transgenes in pure bacterial samples. In the preliminary steps, coding sequences (CDSs) are aligned through two successive BLASTN against the host pangenome with relevant tuned parameters to discriminate CDSs belonging to the wild type genome (wgCDS) from potential GM coding sequences (pgmCDSs). Then, Bray-Curtis distances are calculated between the wgCDS and each pgmCDS, based on the difference of genomic vocabulary. Finally, two machine learning methods, namely the Random Forest and Generalized Linear Model, are carried out to target true GM CDS(s), based on six variables including Bray-Curtis distances and GC content. Tests carried out on a GMBacillus subtilisshowed 25 positive CDSs corresponding to the chloramphenicol resistance gene and CDSs of the inserted plasmids. On a wild typeB. subtilis, no false positive sequences were detected. Conclusion DUGMO detects exogenous CDS, truncated, fused or highly mutated wild CDSs in high-throughput sequencing data, and was shown to be efficient at detecting GM sequences, but it might also be employed for the identification of recent horizontal gene transfers.
Julie Hurel

[hal-02905303] The complete genome sequence of Mycobacterium bovis Mb3601, a SB0120 spoligotype strain representative of a new clonal group

3 years 8 months ago
Mycobacterium bovis strain Mb3601 was isolated from the lymph node of an infected bovine in a bovine tuberculosis highly enzoonotic area of Burgundy, France. It was selected to obtain a complete genome for a new clonal complex, mainly constituted by SB0120-spoligotype strains that we propose to name "European 3". It was recently described as "clonal group I" based on whole-genome SNP analysis of 87 French strains. Here we describe the 4,365,068 bp complete genome obtained by the combination of PacBio and Illumina technologies. This genome of 65.64% G + C content includes 4024 predicted protein-coding genes, 52 tRNA, 3 rRNA and 11 copies of IS6110.
Maxime Branger

[hal-02790980] Taxon appearance from extraction and amplification steps demonstrates the value of multiple controls in tick microbiome analysis

3 years 9 months ago
The developmentof high throughput sequencing (HTS) technologies has substantially improvedanalysis of bacterial community diversity, composition,and functions. Over the last decade, HTS has been used extensively to identify the diversity and composition of tick microbial communities. However, a growing number of studies are warning about the impact of contamination brought along the different steps of the analytical process, from DNA extraction to amplification. In low biomass samples, e.g. individual tick samples, these contaminants may represent a large part of the obtained sequences,and thus generate considerable errors in downstream analyses and in the interpretation of results. Most studies of tick microbiota either do not mention the inclusion of controls during the DNA extraction or amplification steps, or consider the lack of an electrophoresis signal as an absence of contamination. In this context, we aimed to assess theproportion of contaminantsequences resulting from thesesteps. We analyzed the microbiota of individual Ixodesricinusticksbyincluding several categories of controls throughout the analytical process:crushing, DNA extraction,and DNA amplification Results Controls yielded a significant number of sequences (1,126 to 13,198 mean sequences,depending onthe control category). Some operational taxonomic units (OTUs)detected in these controls belong to genera reported in previous tick microbiota studies. Inthis study, these OTUs accounted for 50.9% of the total number of sequences inour samples, and wereconsidered contaminants. Contamination levels (i.e. the percentage of sequences belonging to OTUs identified as contaminants) variedwith tick stage and gender: 76.3% of nymphs and 75% of males demonstrated contamination over 50%, while most females (65.7%) had rateslower than 20%. Contamination mainly correspondedto OTUs detected in crushing and DNA extraction controls, highlighting the importance of carefully controlling these steps. -Conclusion Here,we showed that contaminant OTUs from extraction and amplification stepscan represent more than half the total sequence yield in sequencing runs,and lead to unreliable results when characterizing tick microbial communities.We thus strongly advise the routine use of blanks and negative controls in tick microbiota studies, and more generally in studies involving low biomass.
Emilie Lejal

[hal-02635268] Incorporating Phylogenetic Information in Microbiome Differential Abundance Studies Has No Effect on Detection Power and FDR Control

3 years 10 months ago
We consider the problem of incorporating evolutionary information (e.g., taxonomic or phylogenic trees) in the context of metagenomics differential analysis. Recent results published in the literature propose different ways to leverage the tree structure to increase the detection rate of differentially abundant taxa. Here, we propose instead to use a different hierarchical structure, in the form of a correlation-based tree, as it may capture the structure of the data better than the phylogeny. We first show that the correlation tree and the phylogeny are significantly different before turning to the impact of tree choice on detection rates. Using synthetic data, we show that the tree does have an impact: smoothing p-values according to the phylogeny leads to equal or inferior rates as smoothing according to the correlation tree. However, both trees are outperformed by the classical, non-hierarchical, Benjamini-Hochberg (BH) procedure in terms of detection rates. Other procedures may use the hierarchical structure with profit but do not control the False Discovery Rate (FDR) a priori and remain inferior to a classical Benjamini-Hochberg procedure with the same nominal FDR. On real datasets, no hierarchical procedure had significantly higher detection rate that BH. Intuition advocates that the use of hierarchical structures should increase the detection rate of differentially abundant taxa in microbiome studies. However, our results suggest that current hierarchical procedures are still inferior to standard methods and more effective procedures remain to be invented.
Antoine Bichat