Cancer signatures for reproducible gene expression analysis data: the computational way to achieve precision medicine
Stefania Pirrotta,Laura Masatti,Fabiola Pedrini,Chiara Romualdi,Enrica Calura
University of Padova
Abstract
Cancer is a complex disease, characterized by extensive genomic aberrations with an evident impact on gene expression regulation and cell biological processes. Many studies and some clinical trials proposed gene expression signatures as a valuable tool for understanding cancer mechanisms and defining subtypes. Moreover, transcriptional signatures have the potential to show cancer activities while they are happening. That is essential to guide therapeutic decisions and monitoring interventions. Many are the detectable-by-sequencing cancer features that are important for clinicians for helping them with diagnosis, prognosis and treatment efficacy prediction. However, one of the major problems is the lack of a computational implementation in most cases for the signatures proposed in the literature. The computational implementation would provide detailed signature definition and would assure reproducibility, dissemination and usability of the classifier. To achieve this, we developed signifinder, an R package that collects and implements a compendium of gene expression signatures in cancer, with the aim to ensure an easy and reproducible computation of signatures. A list of gene expression signatures covering numerous cancer hallmarks was collected from the literature. We established a set of stringent criteria for the inclusion of the signature: (i) they should be based on a cancer topic, developed and used on cancer samples; (ii) they have to be based on transcriptomic data from bulk samples; (iii) all the signatures must include the gene list and the method to calculate an expression-based single-sample score. Then, in signifinder, we dedicated a function for each signature that has been implemented. Every single signature requires its own data, such as a proper input or the list of genes with eventually their corresponding coefficients or attributes. We collected and stored all this information inside the signifinder package, so that users can analyze their own normalized expression values without any other additional information or data transformation. Signifinder is developed to work with the most commonly used R data structures and Bioconductor data objects, thus it is compatible with the most popular expression data analysis packages. Signifinder implements more than 40 expression-based signatures from cancer literature. Through the analysis of expression data with the collected signatures, signifinder can attribute to each sample a score per signature that summarizes many different tumor aspects, such as predict the response to therapy or the survival association, as well as quantify multiple microenvironmental conditions, such as hypoxia or the activity of the immune response. For the first time, within a single analysis, multiple cancer signatures can be easily and efficiently calculated in multiple cancer types. Further, to help researchers in dealing with the data interpretation, signifinder is also equipped with a workbox with plenty of graphical visualizations to allow an easy exploration and comparison of signatures scores. With the promise of tailored and optimized predictions for individual cancer patients, gene-expression signatures collected within signifinder can help in automatically investigating tumor samples in a more easy and reproducible way, leading us a step closer to precision medicine.