Comparison of two methods for deteting diffferential expression in a RNA-seq Tilapia study
Reyna Cristina Colli-Dula,Nacira Anahi Albornoz-Abud
Cinvestav del IPN Unidad Mérida
Abstract
Benzo(a)pyrene (BaP) is a hydrocarbon present in aquatic systems derived from various anthropogenic activities. This compound can be bioaccumulated in aquatic organisms of economic importance and because it is considered carcinogenic and mutagenic, it may present adverse effects on the development of reproductive organs. Therefore, its early detection is important at the molecular level in aquatic organisms using RNA-sequencing (RNA-seq) omic techniques, which have revolutionized the characterization of changes in their transcriptome. However, there is an accelerated progress in the development of so many data analysis methods developed for RNA-Seq analysis. It is not yet clear which criteria to choose to select the most appropriate method, especially for a basic user without a solid statistical or computational background. Thus, in order to help to select the best method, we focused on performing a systematic comparison of two widely used Bioconductor packages and pipelines (Cuffdiff, DESeq2, Rsubread, BioMart, CummeRbund and ClusterProfiler) to determine the differential expression genes (DEG) analysis and gene set enrichment analysis (GSEA) between sample groups. Tilapia were exposed via intraperitoneal injection to repeated doses (3mg/Kg of BaP) for 26 d of exposure to provide general guidelines for choosing a robust pipeline. Our results revealed that 1,173 and 809 DEG were affected by BaP using Cuffdiff and DESeq2 respectively applying a p-adjust value. For further evaluation, we explored the transcriptomic responses in both methods. This evaluation revealed that both analysis showed some similar transcriptomic responses, but most results differed between both methods. We identified 432 transcripts having the same patterns of expression between the two methods. In addition, we used only the 432 DEG for the GSEA analysis using the KEGG database. We found that the enriched GO categories for biological processes were consistent using the ClusterProfiler package and g:Profiler plaftform website. The most significant results for the enriched GO categories for biological processes were monocarboxylic acid metabolic processes, acyl-CoA biosynthetic processes and fatty acid metabolic processes among others. For molecular function, statistical differences were found for catalytic activity, oxidoreductase activity and carboxylic ester hydrolase activity, and for GO terms of cellular component differences were with endoplasmic reticulum membrane and endoplasmic reticulum subcompartment. The GSEA using the KEGG database for tilapia revealed only two enriched pathways: Glycerolipid metabolism and PPAR signaling pathway. At the pathway level, both packages show similar results. Our study shows and demonstrates how data analysis tools can markedly affect the outcome of results, highlighting the importance of taking into account the best option.