High-throughput sequencing, bioinformatics and machine learning as a soil health diagnostic tool
The overall goal of this interdisciplinary project is to design a new method and a prototype diagnostic tool to evaluate the impact of pesticides and soil maintenance system on the soil health and quality of vineyard soils, based (i) on ultra high-troughput sequencing data and (ii) on their processing by predictive machine learning methods. Indeed the impact of farming methods and pesticides on soil quality and health is a growing concern for consumers, farmers and soil managers.
In order to assess this impact, bioindicators, such as protists, show a great potential, but their use is limited because the current methods do not allow detailed and fast analysis of soil samples.
To overcome these drawbacks, species identification based on DNA sequences (barcodes) coupled with new ulta high-troughput sequencing techniques represents a promising approach. But the huge amount of sequences and the high complexity makes it difficult to process them by conventional means. Therefore it become essential to develop methods combining bioinformatics and machine learning in order to (i) quantify, analyze, and process protist sequences; (ii) identify and select bioindicators (a subset of protists (OTUs)) associated with different stress factors; but also (iii) to model their relative abundance in function of the different conditions, leading to the construction of diagnostic models. For this project, we propose to develop a biomonitoring apporach in vineyard soils based on the quantification of protist metabarcoding and on the predictive power of machine learning.