A Bioconductor workflow for dynamic spatial proteomics
Lisa M Breckels,Oliver Crook,Laurent Gatto,Kathryn S Lilley
University of Cambridge
Abstract
Spatial proteomics is the systematic study of proteins and their assignment to subcellular niches including organelles. The field has continued to grow in importance as many diseases result from protein mislocalisation. The knowledge of subcellular localisation of proteins is extremely desirable to biologists, as it can assist elucidation of a protein’s role within the cell, as proteins are spatially organised according to their function and specificity of their molecular interactions. Data from high-throughput mass spectrometry (MS) based methods (e.g. [1-6]) can be used to systematically localise thousands of proteins per experiment, and for multiple conditions and states. Alongside the technical advances in MS methods and multiplexing capabilities within quantitative proteomics workflows, in parallel we have established an extensive and reliable set of software for analysing such data over the last decade. Using infrastructure from MSnbase [7], the Bioconductor packages pRoloc [8], pRolocGUI [9], pRolocdata and bandle [10] provide a unifying framework for the analysis of data produced from spatial proteomics experiments. The packages pRoloc and pRolocGUI provide machine learning (ML) and interactive visualisation. Methods for both classical ML and Bayesian statistics are available for protein classification and sophisticated algorithms for novelty detection and transfer learning enable the incorporation of data from complementary technologies and other sources such as freely available in-silico data. This suite of software is most recently joined by the bandle package used to determine the probability of protein re-localisation in comparative experiments. Many features have been added to pRoloc since our first pipeline was published in 2016 [11], including the sister package pRolocGUI. Using experimental data produced from triplicate dynamic experiments on THP-1 human leukaemia cells [12], here, we present a modernised workflow for the complete analysis of dynamic MS-based proteomics, beginning with peptide spectrum matches, and aggregating through to protein-level data. Finally, by exploiting the tools and methods available in bandle, we reveal spatiotemporal patterns that occur during a lipopolysaccharide (LPS)-induced inflammatory response. References: [1] Geladaki, A. et al. Nat Commun 10, 331 doi.org/10.1038/s41467-018-08191-w (2019) [2] Christoforou, A. et al. Nat. Commun. 7:8992 doi.org/10.1038/ncomms9992(2016) [3] Itzhak, DN et al. eLife 5, doi.org/10.7554/eLife.16950 (2016) [4] Orre LM., et al. Mol Cell Jan 3;73(1):166-182.e7 doi.org/10.1016/j.molcel.2018.11.035 (2019) [5] Beltran, PMJ. et al. Cell Syst. doi.org/10.1016/j.cels.2016.08.012 (2016) [6] Jadot, M. et al. Mol. Cell. Proteom. 16, 194–212 doi.org/10.1074%2Fmcp.M116.064527 (2017) [7] Gatto, L et al. Bioinformatics Jan 15;28(2):288-9. doi.org/10.1093/bioinformatics/btr645 (2012) [8] Gatto, L. et al. Bioinformatics May 1;30(9):1322-4. doi.org/10.1093/bioinformatics/btu013 (2014) [9] Breckels, LM. et al R package version 2.7.0, github.com/lgatto/pRolocGUI [10] Crook, OM. et al. bioRxiv doi.org/10.1101/2021.01.04.425239 (2021) [11] Breckels, LM. et al. F1000Research doi.org/10.12688/f1000research.10411.2 (2016) [12] Mulvey, C.M., Breckels, L.M. et al. Nat Commun 12, 5773 doi.org/10.1038/s41467-021-26000-9 (2021).