Skip to main content

Introduction: Using R for Cancer Informatics

R has numerous applications in medicine, ranging from data analysis and visualization to statistical modelling and machine learning.

Among the manifold applications of R in medicine, one prominent use case is the analysis and visualization of medical data. Medical researchers routinely accumulate extensive patient data, comprising demographic information, clinical measurements, and laboratory results. R equips researchers with potent tools for analysing such data and generating insightful visualizations, facilitating the identification of trends and patterns.

For instance, R proves invaluable in analysing clinical trial data to assess the effectiveness of novel treatments vs control groups. Leveraging R's statistical modelling capabilities, researchers can fit regression models to the data, thereby enabling them to evaluate significant differences between treatment and control groups. Furthermore, R facilitates the visualization of study results through the utilization of diverse graphical representations, including bar plots, box plots, and scatter plots.

Beyond clinical trials, R plays a pivotal role in diverse domains within the medical field, encompassing epidemiology, public health, and genomics. For example, researchers employ R to scrutinize vast genomic datasets, thereby identifying genetic mutations associated with specific diseases or unearthing risk factors for chronic ailments within large-scale population-based datasets.

Outlined below are a few specific examples highlighting how R can be employed in the realm of cancer informatics:

1. Genomic data analysis: R encompasses numerous packages tailored for the analysis of genomic data, exemplified by the Bioconductor package for gene expression analysis, such as the widely-used limma package. These packages allow researchers to preprocess and normalize raw genomic data, perform differential expression analysis to identify genes that are differentially expressed between cancer and normal tissue, and identify gene pathways that are altered in cancer. R can also be used to perform downstream analysis, such as gene set enrichment analysis and pathway analysis.

2. Clinical data analysis: In addition to genomic data, clinical data, including patient demographics, treatment history, and disease outcomes, assumes paramount significance in cancer informatics. R offers robust capabilities for analysing and visualizing clinical data, leveraging techniques such as survival analysis to model patient outcomes or logistic regression to identify predictors of treatment response.

3. Imaging data analysis: R can be used to analyse and visualize medical images, such as through the use of image processing packages like the medical imaging package medimaging. This package can be used to perform image segmentation, feature extraction, and image registration, which are important steps in analysing medical images for cancer diagnosis and treatment.

Overall, R is a powerful tool for medical researchers and practitioners, providing sophisticated statistical modelling and visualization capabilities that can help improve patient outcomes and advance medical knowledge.

Sources & Further Reading‚Äč

  • Warner JL, Klemm JD. Informatics Tools for Cancer Research and Care: Bridging the Gap Between Innovation and Implementation. JCO Clin Cancer Inform. 2020;4:784-786. doi:10.1200/CCI.20.00086

  • Kerlavage AR, Kirchhoff AC, Guidry Auvil JM, et al. Cancer Informatics for Cancer Centers: Scientific Drivers for Informatics, Data Science, and Care in Pediatric, Adolescent, and Young Adult Cancer. JCO Clin Cancer Inform. 2021;5:881-896. doi:10.1200/CCI.21.00040