Robustness of radiomic features in magnetic resonance imaging: review and a phantom study

Cattell, Renee; Chen, Shenglan; Huang, Chuan

doi:10.1186/s42492-019-0025-6

Original Article
Open access
Published: 20 November 2019

Robustness of radiomic features in magnetic resonance imaging: review and a phantom study

Visual Computing for Industry, Biomedicine, and Art volume 2, Article number: 19 (2019) Cite this article

6131 Accesses
69 Citations
Metrics details

Abstract

Radiomic analysis has exponentially increased the amount of quantitative data that can be extracted from a single image. These imaging biomarkers can aid in the generation of prediction models aimed to further personalized medicine. However, the generalizability of the model is dependent on the robustness of these features. The purpose of this study is to review the current literature regarding robustness of radiomic features on magnetic resonance imaging. Additionally, a phantom study is performed to systematically evaluate the behavior of radiomic features under various conditions (signal to noise ratio, region of interest delineation, voxel size change and normalization methods) using intraclass correlation coefficients. The features extracted in this phantom study include first order, shape, gray level cooccurrence matrix and gray level run length matrix. Many features are found to be non-robust to changing parameters. Feature robustness assessment prior to feature selection, especially in the case of combining multi-institutional data, may be warranted. Further investigation is needed in this area of research.

Introduction

Overview of radiomics

Radiomics is the extraction of high-dimensional and quantitative mineable data from digital medical images [1,2,3]. The prefix “radio-” refers to the use of radiological images; these digital medical images can come from various modalities, but are most frequently computed tomography (CT), positron emission tomography (PET) and magnetic resonance imaging (MRI) [1, 2]. Patients often receive numerous imaging studies to diagnose, stage, plan treatment and monitor disease progression. Currently in clinical practice, imaging data is only qualitatively or semi-quantitively utilized and a dictated report is created by the radiologist. Radiomic analysis aims to maximize the amount of quantitative information that can be extracted from the existing medical images that may not be appreciable to the naked eye, adding more valuable information that can be used for patient care. The digital image is analyzed by mathematical algorithms and/or filtering of the data to result in a quantitative value. These features are termed quantitative imaging biomarkers. These features can be classified into 2 different groups: semantic and agnostic.

Semantic features can be either qualitatively defined by a radiologist or quantitatively defined by a mathematical algorithm. Examples of semantic features include size, shape, location, vascularity, and spiculation [1, 2]. These are descriptors that are commonly used by radiologists in a qualitative fashion to identify and characterize disease, such as in the case of breast tumors where the size of tumor is indicative of treatment response (Response evaluation criteria in solid tumors criteria) and spiculation being a higher chance of malignancy (Breast Imaging Reporting and Data System) [1, 4,5,6]. Quantitative extraction of semantic features is desired to give a more comprehensive and reproducible description of the region of interest (ROI), whereas visual inspection by radiologist has large intra- and inter-reader variability [5].

Agnostic features aim to quantify the heterogeneity within a ROI based on image intensity. Agnostic features can be further broken down into first order features, second-order features and higher-order features:

First order features are commonly histogram-based and examine gray level signal intensity within a ROI independent of spatial relationships between adjacent voxels. Examples of these features include uniformity, entropy, mean, median and kurtosis [1, 2].

Second-order features, commonly referred to as “texture” features, examine spatial relationship between gray level signal intensities by constructing a gray-level dependence matrix [1, 2]. These features give a measure of intra-region heterogeneity. These were first explored by Haralick et al. [7] in the advent of gray level cooccurrence matrix (GLCM) by analyzing the occurrence of different gray level voxel pairs in different directions. Over the development of radiomics, these features have expanded to include different ways of quantifying spatial relationship between voxels, such as gray level run length matrix (GLRLM), which quantifies the number of consecutive voxels with same gray level [8], and gray level zone length matrix, which quantifies the size of a homogenous area of an image [9].

Higher-order features involve application of a filter or transformation to an image prior to feature extraction. These features aim to identify patterns or highlight details within the image that are not initially perceivable by the reader or are hard to interpret [1, 5]. An example of this type of feature is wavelet transform [10].

As such, this analysis has exponentially increased the amount of information that can be extracted from a single digital image. A single image may contain valuable sub-visual information of the tissue pathophysiology, phenotype and microenvironment that can be captured by quantitative analysis [2].

The suffix “-omics” refers to the combination of this massive amount of quantitative features that can be extracted from a single ROI using mathematical/statistical methods with clinical characteristics to be used in clinical management of patients [1, 2]. A goal of radiomics is to identify robust and consistent imaging biomarkers to aid in clinical decision making, such as the diagnosis of a disease, monitoring of treatment response or prediction of prognosis [1]. This is a step towards “precision” or “personalized” medicine in which these large number of quantitative features from the image of a specific individual coupled with their individual clinical characteristics (age, genomic profiling, etc.) can be used to tailor treatment or assess risk [1, 2, 5].

A large area of study in the field of radiomics include oncological applications, attributed to Quantitative Imaging Network, funded by National Institutes of Health and the Quantitative Imaging Biomarker Alliance, organized by the Radiological Society of North America [2, 5]. Cancer has been noted to be a highly heterogenous disease on both an inter-patient and intra-patient level [2, 11, 12]. There are many applications of radiomics in oncological applications. There is a need for a non-invasive imaging biomarker to better characterize lesions, such as tumor aggressiveness, because a single needle biopsy cannot capture the entire landscape of a tumor [5]. In the case of a more aggressive tumor, it is possible that a more intensive treatment regimen may be tailored to those patients resulting in an improved prognosis [11]. Additionally, characterizing a lesion as malignant or benign could be a useful tool for clinicians to make a more informed diagnosis, reducing stress for the patient and identifying the correct course of action. Furthermore, radiomic analysis could aid in the monitoring of treatment response; current criteria include mainly size and shape changes, whereas there may be subtle changes in the image appearance, not clinically appreciable to the naked eye, which is informative of response [5, 11]. It is possible, that in the case of a clearly non-responding tumor, the patient may be switched to a different/more effective therapy and avoid side effects associated with a treatment from which they are not expected to receive clinical benefit.

As previously mentioned, radiologic images including CT, PET and MRI have been used in radiomics studies. In this article, we focus on MRI. Each modality has its own characteristics which could affect the radiomic analysis. CT and PET have pixel/voxel values with a physical meaning, namely characterizing the x-ray attenuation of tissue through Hounsfield units and cellular activity through Standard Uptake Value, respectively. Thus, the diagnostic or prognostic implications resulting from radiomic analysis will have variable interpretations.

Radiomics in MRI

Overview

MRI is a commonly used modality for radiomic analysis owing to its’ rich contrast mechanisms (such as T1, T2, chemical exchange, diffusion, perfusion, contrast enhancement) and fine soft-tissue detail [13]. A majority of MRI radiomic analysis is performed in oncological applications such as head and neck, prostate, brain and breast cancer.

Head and neck cancer

Numerous studies have performed MRI radiomic analysis on head and neck cancer. Analyzed endpoints included pathological classification, segmentation and prognostic/predictive biomarkers of progression, survival or treatment, with reports of radiomic model performance showing promising results in most studies [13].

Prostate cancer

Multiparametric MRI is an important tool in the diagnosis of prostate cancer, with T2-weighted, dynamic contrast enhanced and diffusion weighted imaging being the core imaging sequences in the Prostate Imaging Reporting and Data System [14]. Detection of prostate cancer is the main focus of radiomics as it applies to prostate cancer, specifically with identification and delineation of the tumor region being the priority [15].

Brain cancer

MRI is a standard of care for brain tumors, most commonly in the form of the contrast-enhanced imaging which can identify tumor areas through their leaky vasculature and breakdown of the blood brain barrier. Main clinical applications of radiomics in brain cancer include prediction of prognosis (survival time), classification of glioblastoma subtypes and discrimination of radiation necrosis tissue from recurrent tumor tissue [16].

Breast cancer

MRI is the modality of choice for assessing extent of disease and monitoring treatment response in patients diagnosed with breast cancer. Similar to brain cancer, a dynamic contrast enhanced series is commonly performed to identify areas of increased, disorganized vascularity associated with malignancy. Studies performed have looked at differentiating benign from malignant lesions, prediction of treatment response, prediction of lymph node metastasis, prediction of molecular profile and prediction of risk of recurrence [17,18,19].

Others

Aside from oncological applications, radiomic analysis has been explored in other pathologies such as Alzheimer’s disease, multiple sclerosis, ischemic stroke and epilepsy [20,21,22,23].

Steps of MRI radiomics

Radiomic analysis of MRI generally consists of 4 main steps: image acquisition, ROI segmentation, feature extraction and feature selection.

Image acquisition factors include scanner (make, model, field), coil, sequence [sequence type, echo time (TE), repetition time (TR), acceleration, voxel size, bandwidth, etc.] and reconstruction algorithm (parallel imaging, compressed sensing, regularization parameters, coil combination, etc.).

ROI segmentation includes automatic, semi-automatic or manual delineation of the ROI in the image.

Feature extraction includes pre-processing steps (normalization, binning to a defined number of gray levels) and application of mathematical algorithms or filters to calculate the feature within the ROI.

Feature selection and model construction includes reduction techniques to reduce the number of redundant features and selection by means of machine learning (least absolute shrinkage and selection operator, support vector machine, etc.).

Changing parameters at any steps in the process could result in different feature values, and thus lessen the consistent and reliable predictive performance. Although many of the parameters in this pipeline are easy to standardize, some of them suffer from more variabilities in MRI radiomics.

Feature robustness in MRI radiomics

Importance of robustness of features in medical imaging

A fundamental requirement to draw reliable conclusions based on any radiomics imaging biomarker is that its value must be stable under different conditions and two measurements obtained under the same conditions must be consistent [24]. There is currently no consensus on how to assess the robustness [25,26,27,28,29,30] (others may refer to it as “stability” [31,32,33,34,35,36],“reproducibility” [26, 37,38,39,40] or “repeatability” [24, 38, 41]) of radiomic features. However, it is recommended in image biomarker standardization initiative (IBSI) [42] to perform feature robustness assessment prior to feature selection. It should be noted that robustness is not a guarantee of the features’ discriminative power and the predictive performance should be investigated [24]. Moreover, feature robustness could be application dependent [43], meaning that a feature that is found to be highly precise for a certain dataset/disease could have poor stability when assessed for another dataset/disease. Several studies [24, 28, 32, 37] emphasized that feature pre-selection based on stability should be performed to generate more reliable results and reduce data dimensionality.

Robustness analysis in MRI

Most of the existing publications assessing image biomarker robustness investigated radiomic features from CT and PET images [30, 44,45,46,47,48]. It was stated in a review paper in 2016 [49] that “the repeatability of MR-based radiomic features has not been investigated”. Since then, there have been some studies in recent publications investigating the robustness of MRI radiomic analysis, but, due to lack of standardization, frequently leads to inconsistent conclusions. We performed a literature search on peer-reviewed full-text articles that analyzed feature robustness based on MRI and summarized them in Table 1 (16 on human subjects, and 5 exclusively on phantoms). These publications have assessed some parameters such as vendor [33, 40, 51], scanner [31, 33], acquisition parameters [52, 59], observers [26, 37, 39, 50] and pre-processing parameters [24, 38, 50, 53, 54], however, there still remains much to be investigated.

Table 1 Summary of literature for magnetic resonance imaging radiomics feature robustness

Full size table

The importance of complete and clear reporting was also highlighted in several studies. IBSI [42] presented informative reporting guidelines on image pre-processing and feature extraction. Additionally, the radiomics quality score was proposed by the D-Lab [43]; this assigns a value based on 16 key points on the reporting of radiomics studies. With the aid of these two standards, it was found that many studies were lacking in the clear and concise description of (1) software implementation (i.e., chosen setting parameters, equations), (2) pre-processing steps (i.e., normalization, quantization) and (3) statistical methods use to quantify or assess feature robustness [i.e., form of intraclass correlation coefficient (ICC)]. Additionally, use of an external validation set is an important step to robustness feature analysis that was lacking in many of these studies.

We believe one option to improve robustness analysis of MRI radiomics studies is to systematically evaluate the behavior of the radiomic features under various conditions. With a well-defined “dictionary” of robust features, researchers can perform a pre-selection step based on their specific application. Here, we demonstrate such effort by evaluating feature robustness to MRI image signal to noise ratio (SNR), ROI delineation, small voxel size variation and normalization methods through a phantom study. The workflow of the study is displayed in Fig. 1. We measure degree of robustness using ICC (2-way mixed-effects model, single rater, absolute agreement) and separation into three groups based on ICC values: high (> 0.9), moderate (0.5–0.9) and low (< 0.5) for each of the conditions investigated.

Results and discussion

SNR

In MRI, there are many factors affecting the SNR of an image even if all acquisition parameters are set to the same values and acquisitions are performed on the same scanner. Examples of these factors include coil load, analog-to-digital gain, shimming, reconstruction method and size of the patient. In fact, due to the inhomogeneity of coil sensitivity, SNR can even vary within the same slice of image. This can be due to both B1+ (transmit) and B1- (receiving) properties of the coil. In this study, we systematically evaluate the effect of several levels of SNR using phantom data with added Gaussian noise. We also analyze the effect of two normalization methods on the radiomic results.

T2 weighted phantom images used in the analysis are shown in Fig. 2a, with ROIs drawn on a pineapple core (red), banana (blue), orange (orange) and kiwi (green). Regions of interest used in SNR calculation are shown in Fig. 2b.

Complex Gaussian noise was added to the original image (Fig. 3c) and magnitude images were used for the analysis. Two noise levels [SNR 45 (Fig. 3a) and SNR 75 (Fig. 3b)] were generated from the original image whose SNR is 124. To the naked eye, there isn’t a large visual difference between SNR of 45 and SNR of 75. These SNR levels are representative of those seen in clinical images. As mentioned above, SNR is spatially varying in MRI, the SNR values used here are simply representation of the overall noise level of the image.

Shape features were omitted from this part of the analysis because the same ROI was used across all SNR steps. This portion of the study aimed to analyze only the effect of added noise, and not intra- or inter-reader variability in ROI delineation. Details of the study is described in the Methods section, summarily, three most commonly used types of features (first order features, GLCM features, and GLRLM features) were studied using 10 different noise realizations and 2 different normalization techniques. Specifically, features within each group and their respective ICCs (2-way mixed-effects model, single rater, absolute agreement) are summarized in Table 2. The results using the first normalization technique (mean ± 3SD) are shown in Table 3 and Fig. 4a. The majority of first order features, 11 out of 13 have an ICC greater than 0.9, indicating high robustness to added noise. However, only 5 out of 22 GLCM features have an ICC greater than 0.9. A majority of the GLCM features (14 out of 22) were found to be of moderate robustness, represented by ICC between 0.5 and 0.9. All GLRLM features were found to have moderate robustness (0.5–0.9).

Table 2 Average of intraclass correlation coefficient value over 10 noise realizations in reference to variation in signal to noise ratio, region of interest dilation/erosion and small variation in voxel size

Full size table

Table 3 Number of features of high, moderate and low robustness in each feature class, as defined by average of intraclass correlation coefficient over 10 noise realizations, in reference to signal to noise variation with normalization of mean ± 3SD or zero to maximum

Full size table

Second order texture features, namely GLCM and GLRLM, are impacted by the normalization procedure. The prior SNR analysis used mean ± 3 SD for normalization. Analysis was also performed by using zero to maximum normalization. Each method has its respective limitations. Mean ± 3SD normalization should be able to provide better separation due to a decrease in dynamic range, as compared to zero to maximum normalization making it more sensitive to small changes. However, mean ± 3SD is more likely to be sensitive to noise. Results using zero to maximum normalization procedure are summarized in Table 3 and Fig. 4b. First order features are not affected by normalization/quantization because they directly use all intensity value independently. As compared to the mean ± 3SD method, for GLCM features there is a trend toward higher ICC values, with no features in the low robustness group (ICC < 0.5). For GLRLM features, there is a similar trend, with higher proportion of features in the high robustness category (ICC > 0.9). As mentioned previously, Table 2 includes the full list of features and their respective ICC values. It is noted that in the ICC plots there is an observed clustering. It is hypothesized that these are because (1) a limited number of regions of interest are being compared, and (2) calculated features may be highly correlated.

ROI delineation

In practice, intra- and inter-reader variability in the manual segmentation of regions of interest is inevitable. Subjective determination of abnormal tissue may not be consistent across readers due to variables such as difference in experience or difference in contrast windowing. The effect of ROI dilation and erosion was also studied to evaluate feature’s robustness to ROI variations.

Two types of ROI manipulations were performed: dilation (by 1 pixel) and erosion (also by 1 pixel) as shown in Fig. 5. Similar to above, analysis was performed using 2 different normalization techniques: mean ± 3SD and zero to maximum.

For ROI erosion using mean ± 3SD normalization, results are summarized in Table 4 and Fig. 6a. All 10 shape features and 20 out of 22 GLCM features are found to be highly robust. However, only 10 out of 13 first order features and 6 out of 11 GLRLM features are found to be highly robust to ROI erosion. No feature is found to have an ICC less than 0.5. Results using zero to maximum normalization are summarized in Table 4 and Fig. 6b. By definition, first order and shape features are not affected by normalization differences. There is an upward trend in robustness of GLRLM feature, where all features are highly robust to ROI erosion using normalization method zero to maximum.

Table 4 Number of features of high, moderate and low robustness in each feature class, as defined by average of intraclass correlation coefficient over 10 noise realizations in reference to erosion of region of interest with normalization of mean ± 3SD or zero to maximum

Full size table

For ROI dilation, mean ± 3SD normalization results are summarized in Table 5 and Fig. 6c. Shape is a highly robust feature. However, the other feature categories have relatively poorer robustness, with only 7 out of 13, 15 out of 22 and 7 out of 11 features with ICC greater than 0.9 for first order, GLCM and GLRLM groups, respectively. Table 2 lists individual features and their respective ICC values. Zero to maximum normalization results are summarized in Table 5 and Fig. 6d. There is an upward trend of ICC values using zero to maximum normalization method. Similar clustering is observed within ICC plots as described previously.

Table 5 Number of features of high, moderate and low robustness in each feature class, as defined by average of intraclass correlation coefficient over 10 noise realizations in reference to dilation of region of interest with normalization of mean ± 3SD or zero to maximum

Full size table

As expected, dilation resulted in poorer robustness when compared to erosion. This is because dilation may incorporate tissue that is outside the ROI, whereas erosion still only includes voxels in the original ROI. It is noted that in our study dilation of the ROI may include "fruit skin", which can be highly different in visual appearance than the interior, or surrounding air. In non-phantom study, such as a ROI of a tumor, the overestimation or dilation of an ROI would likely include surrounding tissue and not surrounding air. However, there are tumors which are located next to air cavities, such as nasopharyngeal cancer, and robustness of features to dilation may be application based. The result of this comparison indicates that it may be more beneficial to be conservative when defining an ROI.

Small voxel size variation

In order to accommodate the different sizes of patients, it is a general practice for the technologist to adjust the field of view (FOV) on the fly without changing other parameters. Although strictly speaking, changing FOV will always affect some other parameters such as TE, bandwidth, gradient slew rate, which in turns affecting SNR. The effect of these small voxel size variations, and its relation to radiomic feature robustness, is understudied. In this part of the study, variation of voxel size was introduced by acquiring images with slight change of the FOV and matrix size. To remove effect of SNR variations caused by pixel size changes, all images were normalized to the same SNR. Previous studies have tried to solve this problem by performing interpolation, however, interpolation introduces other complications and affect feature robustness [27].

The same slice was acquired with 4 different in-plane resolutions of 0.47, 0.50, 0.56 and 0.67 mm as shown in Fig. 7a-d, respectively. All other parameters were kept the same when possible. The SNRs of individual images were normalized to an SNR level of 75 by adding Gaussian noise and 10 different noise realizations were performed numerically. Results with mean ± 3SD normalization are summarized in Table 6 and Fig. 8a. Even though minor voxel size variation will affect ROI, which in turn affects shape features, all shape features were found to be robust to minor voxel size variations. First order, GLCLM and GLRLM features groups are found to have 8 out of 13, 12 out of 22 and 6 out of 11 features, respectively, to be highly robust to small differences in voxel sizes. Individual feature ICCs are reported in Table 2. Results for zero to maximum normalization are summarized in Table 6 and Fig. 8b. Similar upward trends in ICC of GLCM and GLRLM are noted. Similar clustering is observed within ICC plots as described previously.

Table 6 Number of features of high, moderate and low robustness in each feature class, as defined by average of intraclass correlation coefficient over 10 noise realizations in reference to pixel size with normalization of mean ± 3SD or zero to maximum

Full size table

Small variability in voxel size does not result in a large visual difference, however differences are observed in radiomic feature extraction as reported here. Since small variation in voxel size can result in a reduction in robustness, it is expected that this result is even more concerning when comparing voxel sizes of larger differences. Especially in multi-institutional studies, it is common to see a large range of different voxel sizes used in analysis.

Limitations

Our study has several limitations. Firstly, the results from phantom study cannot always be transferred to clinical studies. However, we note that robustness of radiomic features are application dependent and phantoms can still be used to investigate feature pre-selection pipeline. One way to show the transferability of phantom study is to compare the variability of each feature obtained from phantom to that calculated from tumors [60]. Secondly, we investigated only one sequence from one particular scanner. Although there are fundamental differences between scanners, inter-scanner variability could be addressed if the bias is corrected in image preprocessing step [51]. Lastly, we only investigated 2D radiomic features of certain classes. Future work should explore robustness of 3D features including filter-based features from multi-scanner images combined with clinical data.

Conclusions

Radiomic analysis is a step towards personalized medicine by an exponential increase in the amount of quantitative data that can be extracted from medical images. In current literature, feature robustness in MRI is understudied and feature extraction techniques are not universally standardized. There is a need for systematic evaluation of feature robustness. This is required to ensure that a predictive biomarker is reproducible and generalizable, especially across different institutions where parameters can be very variable. Application-based feature pre-selection step will be pivotal in anticipation for incorporation of radiomics-based tools in the clinic.

Methods

Phantom MR imaging

A pineapple, a gold kiwi, an orange, a banana and a strawberry placed on Styrofoam box served as radiomics phantom for our study. All images were acquired on a 3 T Siemens scanner (Biograph mMR) with a T2-weighted Turbo Spin Echo sequence using a 12 channel PET compatible head-coil. Acquisition parameters: echo train length = 18, TE = 98 ms, TR = 7360 ms, slide thickness/gapping = 2/0 mm, pixel bandwidth = 219 Hz, flip angle = 150 degree, 100% phase sampling, 100% phase FOV, body coil transmission, 1 average. Different axial resolutions were acquired by changing matrix size and FOV with parameters listed in Table 7.

Table 7 Voxel size, matrix size and field of view used in the voxel size variation analysis

Full size table

Image segmentation

First, image segmentation was performed manually on one slice of Series 2 using ITK-SNAP (version 3.6.0; http://www.itksnap.org). The ROIs on different fruits were then interpolated with linear method on the same slice of the rest of the series using MATLAB R2019a. To be conservative with ROI, threshold was set to 1. All interpolated ROIs were visually checked and corrected manually to exclude the fruit/air interface and discontinuities.

Image processing

In order to calculate the SNR of the original image the mean intensity of a homogenous region within a ROI (kiwi) is divided by the mean intensity of the background. These ROIs are shown in Fig. 2b. Because the mean of a Rayleigh distribution is \( \sqrt{\pi /2}\ \sigma \), where σ is the mode, the calculated SNR was further corrected by dividing \( \sqrt{\pi /2} \) . Complex Gaussian noise was added to the original image and magnitude images were used for the analysis. Two noise levels (SNR 45 and 75) were generated from the original image whose SNR is 124. Ten different noise realizations were performed numerically for each SNR level in order to identify the results with test-retest imaging. In-built MATLAB imdilate and imerode functions with a 3*3 stucturing element were used to dilate and erode ROIs. The entire preprocessing was implemented in MATLAB (MATLAB R2019a).

Feature extraction

A set of 56 features were extracted using an IBSI compliant in-house software (in MATLAB) partially adapted from the Vallieres radiomics toolbox [61] and ImFEATbox [62]. Features are summarized in Table 2. Thirteen of the features were first order statistics based, 10 were 2D shape based, while texture features were computed from the grey-level co-occurrence matrix (GLCM, 22 features) and grey-level run-length matrix (GLRLM, 11 features) merged from all four 2D directional matrices. The definitions of first order statistics based and texture features could be found in Parmar et al [63], while the definitions of 2D shape features could be found in Griethuysen et al [64]. Both first order and 2D shape features were directly implemented in MATLAB based on their definitions. For texture features, GLCM and GLRLM matrix computation and GLRLM feature extraction was adapted from the Vallieres radiomics toolbox, while GLCM features were adapted from ImFEATbox based on their definitions. Prior to calculating texture matrix, all images underwent intensity discretization to 64 levels based on IBSI recommendations, with intensity values rescaled by mean ± 3SD or zero to maximum intensity (to assess texture feature robustness on different discretization scales).

Robustness analysis

Feature robustness was assessed using ICC when performed at different SNR, different acquisition voxel size and ROI transformation, assuming these variations possess no consistent bias for different ROIs. Each noise level, voxel dimension and ROI transformation accounts for a rater and each intensity mask (containing intensities with selected voxels) accounts for a subject. Based on ICC reporting guidelines [65], ICC (2,1) was selected (“2-way mixed-effects model, single rater, absolute agreement”) as features are considered to be stable if their values remain the same across different variations. ICCs were calculated in MATLAB (MATLAB R2019a). For SNR and ROI dilation/erosion analysis, 5 ROIs were analyzed for a single image resolution (0.5 mm × 0.5 mm × 2.0 mm), with 10 different noise realizations, resulting in 50 samples per image. There were 2 groups being compared (SNR = 45 versus SNR = 75, original ROI versus eroded ROI, original ROI versus dilated ROI). For voxel size analysis, 5 ROIs were analyzed with 10 different nose realizations, resulting in 50 samples per image. These were analyzed across 4 different in-plane resolutions (0.47, 0.50, 0.56, 0.67 mm). ICC was assessed between groups for each calculated feature.

Availability of data and materials

All data will be provided upon written request.

Abbreviations

1-NN:: 1-nearest neighbor
2D:: Two-dimensional
ACC:: Average correlation coefficient
ADC:: Apparent diffusion coefficient
CCC:: Concordance correlation coefficient
CT:: Computed tomography
CV:: Coefficient of variation
DCE:: Dynamic contrast-enhanced
DISCO:: Differential subsampling with cartesian ordering
DR:: Dynamic range
DSC:: Dice similarity coefficients
DWI:: Diffusion-weighted imaging
FLAIR:: Fluid-attenuated inversion recovery
FOV:: Field of view
GLCM:: Gray level cooccurrence matrix
GLRLM:: Gray level run length matrix
IBSI:: Image biomarker standardization initiative
ICC:: Intraclass correlation coefficient
k-NN:: k nearest neighbor
LDA:: Linear discriminant analysis
LoG:: Laplacian of Gaussian
MRI:: Magnetic resonance imaging
N/A:: Not applicable
NAs:: Number of acquisitions
PDW:: Proton density weighted
PE:: Peak enhancement
PET:: Positron emission tomography
POE:: Probability of error
ROC:: Receiver operating characteristic
ROI:: Region of interest
SBW:: Sampling bandwidth
SER:: Signal enhancement ratio
SNR:: Signal to noise ratio
TE:: Echo time
TR:: Repetition time
wCV:: Within-subject coefficient of variation

References

Rizzo S, Botta F, Raimondi S, Origgi D, Fanciullo C, Morganti AG et al (2018) Radiomics: the facts and the challenges of image analysis. Eur Radiol Exp 2(1):36. https://doi.org/10.1186/s41747-018-0068-z
Article Google Scholar
Gillies RJ, Kinahan PE, Hricak H (2016) Radiomics: images are more than pictures, they are data. Radiology 278(2):563–577. https://doi.org/10.1148/radiol.2015151169
Article Google Scholar
Kumar V, Gu YH, Basu S, Berglund A, Eschrich SA, Schabath MB et al (2012) Radiomics: the process and the challenges. Magn Reson Imaging 30(9):1234–1248. https://doi.org/10.1016/j.mri.2012.06.010
Article Google Scholar
Eisenhauer EA, Therasse P, Bogaerts J, Schwartz LH, Sargent D, Ford R et al (2009) New response evaluation criteria in solid tumours: revised RECIST guideline (version 1.1). Eur J Cancer 45(2):228–247. https://doi.org/10.1016/j.ejca.2008.10.026
Article Google Scholar
Aerts HJ (2016) The potential of radiomic-based phenotyping in precision medicine: a review. JAMA Oncol 2(12):1636–1642. https://doi.org/10.1001/jamaoncol.2016.2631
Article Google Scholar
Spak DA, Plaxco JS, Santiago L, Dryden MJ, Dogan BE (2017) BI-RADS® fifth edition: a summary of changes. Diagn Interv Imaging 98(3):179–190. https://doi.org/10.1016/j.diii.2017.01.001
Article Google Scholar
Haralick RM, Shanmugam K, Dinstein I (1973) Textural features for image classification. IEEE Trans Syst Man Cybernet SMC-3(6):610–621. https://doi.org/10.1109/Tsmc.1973.4309314
Article Google Scholar
Galloway MM (1974) Texture analysis using grey level run lengths. Comput Graph Image Process 4(2):172–179. https://doi.org/10.1016/s0146-664x(75)80008-6
Article Google Scholar
Thibault G, Fertil B, Navarro C, Pereira S, Cau P, Levy N et al (2013) Shape and texture indexes application to cell nuclei classification. Int J Pattern Recogn Artif Intell 27(1):1357002. https://doi.org/10.1142/S021800141357002
Article MathSciNet Google Scholar
Mallat SG (1989) A theory for multiresolution signal decomposition: the wavelet representation. IEEE Trans Pattern Anal Mach Intell 11(7):674–693. https://doi.org/10.1109/34.192463
Article MATH Google Scholar
Davnall F, Yip CS, Ljungqvist G, Selmi M, Ng F, Sanghera B et al (2012) Assessment of tumor heterogeneity: an emerging imaging tool for clinical practice? Insights Imaging 3(6):573–589. https://doi.org/10.1007/s13244-012-0196-6
Article Google Scholar
O'Connor JP, Rose CJ, Waterton JC, Carano RA, Parker GJ, Jackson A (2015) Imaging intratumor heterogeneity: role in therapy response, resistance, and clinical outcome. Clin Cancer Res 21(2):249–257. https://doi.org/10.1158/1078-0432.CCR-14-0990
Article Google Scholar
Jethanandani A, Lin TA, Volpe S, Elhalawani H, Mohamed ASR, Yang P et al (2018) Exploring applications of radiomics in magnetic resonance imaging of head and neck cancer: a systematic review. Front Oncol 8:131. https://doi.org/10.3389/fonc.2018.00131
Article Google Scholar
Weinreb JC, Barentsz JO, Choyke PL, Cornud F, Haider MA, Macura KJ et al (2016) PI-RADS prostate imaging - reporting and data system: 2015, version 2. Eur Urol 69(1):16–40. https://doi.org/10.1016/j.eururo.2015.08.052
Article Google Scholar
Sun Y, Reynolds HM, Parameswaran B, Wraith D, Finnegan ME, Williams S et al (2019) Multiparametric MRI and radiomics in prostate cancer: a review. Australas Phys Eng Sci Med 42(1):3–25. https://doi.org/10.1007/s13246-019-00730-z
Article Google Scholar
Zhou M, Scott J, Chaudhury B, Hall L, Goldgof D, Yeom KW et al (2018) Radiomics in brain tumor: image assessment, quantitative feature descriptors, and machine-learning approaches. AJNR Am J Neuroradiol 39(2):208–216. https://doi.org/10.3174/ajnr.A5391
Article Google Scholar
Crivelli P, Ledda RE, Parascandolo N, Fara A, Soro D, Conti M (2018) A new challenge for radiologists: radiomics in breast cancer. Biomed Res Int 2018:6120703. https://doi.org/10.1155/2018/6120703
Article Google Scholar
Valdora F, Houssami N, Rossi F, Calabrese M, Tagliafico AS (2018) Rapid review: radiomics and breast cancer. Breast Cancer Res Treat 169(2):217–229. https://doi.org/10.1007/s10549-018-4675-4
Article Google Scholar
Liu CL, Ding J, Spuhler K, Gao Y, Serrano Sosa MS, Moriarty M et al (2019) Preoperative prediction of sentinel lymph node metastasis in breast cancer by radiomic signatures from dynamic contrast-enhanced MRI. J Magn Reson Imaging 49(1):131–140. https://doi.org/10.1002/jmri.26224
Article Google Scholar
Feng F, Wang P, Zhao K, Zhou B, Yao HX, Meng QQ et al (2018) Radiomic features of hippocampal subregions in alzheimer's disease and amnestic mild cognitive impairment. Front Aging Neurosci 10:290. https://doi.org/10.3389/fnagi.2018.00290
Article Google Scholar
Zhang YY, Moore GR, Laule C, Bjarnason TA, Kozlowski P, Traboulsee A et al (2013) Pathological correlates of magnetic resonance imaging texture heterogeneity in multiple sclerosis. Ann Neurol 74(1):91–99. https://doi.org/10.1002/ana.23867
Article Google Scholar
Feng R, Badgeley M, Mocco J, Oermann EK (2018) Deep learning guided stroke management: a review of clinical applications. J Neurointerv Surg 10(4):358–362. https://doi.org/10.1136/neurintsurg-2017-013355
Article Google Scholar
Kassner A, Thornhill RE (2010) Texture analysis: a review of neurologic MR imaging applications. AJNR Am J Neuroradiol 31(5):809–816. https://doi.org/10.3174/ajnr.A2061
Article Google Scholar
Schwier M, van Griethuysen J, Vangel MG, Pieper S, Peled S, Tempany C et al (2019) Repeatability of multiparametric prostate MRI radiomics features. Sci Rep 9(1):9441. https://doi.org/10.1038/s41598-019-45766-z
Article Google Scholar
Whybra P, Parkinson C, Foley K, Staffurth J, Spezi E (2019) Assessing radiomic feature robustness to interpolation in ¹⁸F-FDG PET imaging. Sci Rep 9(1):9649. https://doi.org/10.1038/s41598-019-46030-0
Article Google Scholar
Baessler B, Weiss K, Pinto Dos Santos D (2019) Robustness and reproducibility of radiomics in magnetic resonance imaging: a phantom study. Investig Radiol 54(4):221–228. https://doi.org/10.1097/RLI.0000000000000530
Article Google Scholar
Molina D, Pérez-Beteta J, Martínez-González A, Martino J, Velasquez C, Arana E et al (2017) Lack of robustness of textural measures obtained from 3D brain tumor MRIs impose a need for standardization. PLoS One 12(6):e0178843. https://doi.org/10.1371/journal.pone.0178843
Article Google Scholar
Zwanenburg A, Leger S, Agolli L, Pilz K, Troost EGC, Richter C et al (2019) Assessing robustness of radiomic features by image perturbation. Sci Rep 9(1):614. https://doi.org/10.1038/s41598-018-36938-4
Article Google Scholar
Tanaka S, Kadoya N, Kajikawa T, Matsuda S, Dobashi S, Takeda K et al (2019) Investigation of thoracic four-dimensional CT-based dimension reduction technique for extracting the robust radiomic features. Phys Med 58:141–148. https://doi.org/10.1016/j.ejmp.2019.02.009
Article Google Scholar
Mori M, Benedetti G, Partelli S, Sini C, Andreasi V, Broggi S et al (2019) Ct radiomic features of pancreatic neuroendocrine neoplasms (panNEN) are robust against delineation uncertainty. Phys Med 57:41–46. https://doi.org/10.1016/j.ejmp.2018.12.005
Article Google Scholar
Saha A, Harowicz MR, Mazurowski MA (2018) Breast cancer MRI radiomics: an overview of algorithmic features and impact of inter-reader variability in annotating tumors. Med Phys 45(7):3076–3085. https://doi.org/10.1002/mp.12925
Article Google Scholar
Bologna M, Corino VDA, Montin E, Messina A, Calareso G, Greco FG et al (2018) Assessment of stability and discrimination capacity of radiomic features on apparent diffusion coefficient images. J Digit Imaging 31(6):879–894. https://doi.org/10.1007/s10278-018-0092-9
Article Google Scholar
Peerlings J, Woodruff HC, Winfield JM, Ibrahim A, Van Beers BE, Heerschap A et al (2019) Stability of radiomics features in apparent diffusion coefficient maps from a multi-Centre test-retest trial. Sci Rep 9(1):4800. https://doi.org/10.1038/s41598-019-41344-5
Article Google Scholar
Aerts HJWL, Velazquez ER, Leijenaar RTH, Parmar C, Grossmann P, Carvalho S et al (2014) Decoding tumour phenotype by noninvasive imaging using a quantitative radiomics approach. Nat Commun 5(1):4006. DOI: ARTN 4644. https://doi.org/10.1038/ncomms5644
Leijenaar RTH, Carvalho S, Velazquez ER, van Elmpt WJC, Parmar C, Hoekstra OS et al (2013) Stability of FDG-PET Radiomics features: an integrated analysis of test-retest and inter-observer variability. Acta Oncol 52(7):1391–1397. https://doi.org/10.3109/0284186X.2013.812798
Article Google Scholar
Gudmundsson S, Runarsson TP, Sigurdsson S (2012) Test-retest reliability and feature selection in physiological time series classification. Comput Methods Prog Biomed 105(1):50–60. https://doi.org/10.1016/j.cmpb.2010.08.005
Article Google Scholar
Lecler A, Duron L, Balvay D, Savatovsky J, Bergès O, Zmuda M et al (2019) Combining multiple magnetic resonance imaging sequences provides independent reproducible radiomics features. Sci Rep 9(1):2068. https://doi.org/10.1038/s41598-018-37984-8
Article Google Scholar
Fiset S, Welch ML, Weiss J, Pintilie M, Conway JL, Milosevic M et al (2019) Repeatability and reproducibility of MRI-based radiomic features in cervical cancer. Radiother Oncol 135:107–114. https://doi.org/10.1016/j.radonc.2019.03.001
Article Google Scholar
Duron L, Balvay D, Vande Perre S, Bouchouicha A, Savatovsky J, Sadik JC et al (2019) Gray-level discretization impacts reproducible MRI radiomics texture features. PLoS One 14(3):e0213459. https://doi.org/10.1371/journal.pone.0213459
Article Google Scholar
Chirra P, Leo P, Yim M, Bloch BN, Rastinehad AR, Purysko A et al (2018) Empirical evaluation of cross-site reproducibility in radiomic features for characterizing prostate MRI. In: abstracts of SPIE 10575, medical imaging 2018: computer-aided diagnosis, SPIE, Houston, Texas, United States, 27 February 2018. https://doi.org/10.1117/12.2293992
Gourtsoyianni S, Doumou G, Prezzi TB, Stirling JJ, Taylor NJ et al (2017) Primary rectal cancer: repeatability of global and local-regional mr imaging texture features. Radiology 284(2):552–561. https://doi.org/10.1148/radiol.2017161375
Article Google Scholar
Zwanenburg A, Leger S, Vallières M, Löck S. Initiative for the IBS. Image biomarker standardisation initiative. 2016; https://doi.org/10.17195/candat.2016.08.1.
Lambin P, Leijenaar RTH, Deist TM, Peerlings J, de Jong EEC, van Timmeren J et al (2017) Radiomics: the bridge between medical imaging and personalized medicine. Nat Rev Clin Oncol 14(12):749–762. https://doi.org/10.1038/nrclinonc.2017.141
Article Google Scholar
Traverso A, Wee L, Dekker A, Gillies R (2018) Repeatability and reproducibility of radiomic features: a systematic review. Int J Radiat Oncol Biol Phys 102(4):1143–1158. https://doi.org/10.1016/j.ijrobp.2018.05.053
Article Google Scholar
Zhovannik I, Bussink J, Traverso A, Shi ZW, Kalendralis P, Wee L et al (2019) Learning from scanners: bias reduction and feature correction in radiomics. Clin Trans Radiat Oncol 19:33–38. https://doi.org/10.1016/j.ctro.2019.07.003
Article Google Scholar
Vuong D, Tanadini-Lang S, Huellner MW, Veit-Haibach P, Unkelbach J, Andratschke N et al (2019) Interchangeability of radiomic features between [18F]-FDG PET/CT and [18F]-FDG PET/MR. Med Phys 46(4):1677–1685. https://doi.org/10.1002/mp.13422
Article Google Scholar
Papp L, Rausch I, Grahovac M, Hacker M, Beyer T (2019) Optimized feature extraction for radiomics analysis of ¹⁸F-FDG PET imaging. J Nucl Med 60(6):864–872. https://doi.org/10.2967/jnumed.118.217612
Article Google Scholar
Forgács A, Béresová M, Garai I, Lassen ML, Beyer T, DiFranco MD et al (2019) Impact of intensity discretization on textural indices of [¹⁸F]FDG-PET tumour heterogeneity in lung cancer patients. Phys Med Biol 64(12):125016. https://doi.org/10.1088/1361-6560/ab2328
Article Google Scholar
Yip SSF, Aerts HJWL (2016) Applications and limitations of radiomics. Phys Med Biol 61(13):R150–R166. https://doi.org/10.1088/0031-9155/61/13/r150
Article Google Scholar
Traverso A, Kazmierski M, Shi ZW, Kalendralis P, Welch M, Nissen HD et al (2019) Stability of radiomic features of apparent diffusion coefficient (ADC) maps for locally advanced rectal cancer in response to image pre-processing. Phys Med 61:44–51. https://doi.org/10.1016/j.ejmp.2019.04.009
Article Google Scholar
Um H, Tixier F, Bermudez D, Deasy JO, Young RJ, Veeraraghavan H (2019) Impact of image preprocessing on the scanner dependence of multi-parametric MRI radiomic features and covariate shift in multi-institutional glioblastoma datasets. Phys Med Biol 64(16):165011. https://doi.org/10.1088/1361-6560/ab2f44
Article Google Scholar
Buch K, Kuno H, Qureshi MM, Li BJ, Sakai O (2018) Quantitative variations in texture analysis features dependent on MRI scanning parameters: a phantom model. J Appl Clin Med Phys 19(6):253–264. https://doi.org/10.1002/acm2.12482
Article Google Scholar
Yang F, Dogan N, Stoyanova R, Ford JC (2018) Evaluation of radiomic texture feature error due to MRI acquisition and reconstruction: a simulation study utilizing ground truth. Phys Med 50:26–36. https://doi.org/10.1016/j.ejmp.2018.05.017
Article Google Scholar
Brynolfsson P, Nilsson D, Torheim T, Asklund T, Karlsson CT, Trygg J et al (2017) Haralick texture features from apparent diffusion coefficient (ADC) MRI images depend on imaging and pre-processing parameters. Sci Rep 7(1):4041. https://doi.org/10.1038/s41598-017-04151-4
Article Google Scholar
Guan Y, Li WF, Jiang ZR, Chen Y, Liu S, He J et al (2016) Whole-lesion apparent diffusion coefficient-based entropy-related parameters for characterizing cervical cancers: initial findings. Acad Radiol 23(12):1559–1567. https://doi.org/10.1016/j.acra.2016.08.010
Article Google Scholar
Molina D, Pérez-Beteta J, Martínez-González A, Martino J, Velásquez C, Arana E et al (2016) Influence of gray level and space discretization on brain tumor heterogeneity measures obtained from magnetic resonance images. Comput Biol Med 78:49–57. https://doi.org/10.1016/j.compbiomed.2016.09.011
Article Google Scholar
Savio SJ, Harrison LCV, Luukkaala T, Heinonen T, Dastidar P, Soimakallio S et al (2010) Effect of slice thickness on brain magnetic resonance image texture analysis. Biomed Eng Online 9:60. https://doi.org/10.1186/1475-925X-9-60
Article Google Scholar
Mayerhoefer ME, Szomolanyi P, Jirak D, Materka A, Trattnig S (2009) Effects of MRI acquisition parameter variations and protocol heterogeneity on the results of texture analysis and pattern discrimination: an application-oriented study. Med Phys 36(4):1236–1243. https://doi.org/10.1118/1.3081408
Article Google Scholar
Collewet G, Strzelecki M, Mariette F (2004) Influence of MRI acquisition protocols and image intensity normalization methods on texture classification. Magn Reson Imaging 22(1):81–91. https://doi.org/10.1016/j.mri.2003.09.001
Article Google Scholar
Mackin D, Fave X, Zhang LF, Fried D, Yang JZ, Taylor B et al (2015) Measuring computed tomography scanner variability of radiomics features. Investig Radiol 50(11):757–765. https://doi.org/10.1097/RLI.0000000000000180
Article Google Scholar
Vallières M, Freeman CR, Skamene SR, El Naqa I (2015) A radiomics model from joint FDG-PET and MRI texture features for the prediction of lung metastases in soft-tissue sarcomas of the extremities. Phys Med Biol 60(14):5471–5496. https://doi.org/10.1088/0031-9155/60/14/5471
Article Google Scholar
Liebgott A, Küstner T, Strohmeier H, Hepp T, Mangold P, Martirosian P et al (2018) ImFEATbox: a toolbox for extraction and analysis of medical image features. Int J Comput Assist Radiol Surg 13(12):1881–1893. https://doi.org/10.1007/s11548-018-1859-7
Article Google Scholar
Parmar C, Rios Velazquez E, Leijenaar R, Jermoumi M, Carvalho S, Mak RH et al (2014) Robust radiomics feature quantification using semiautomatic volumetric segmentation. PLoS One 9(7):e102107. https://doi.org/10.1371/journal.pone.0102107
Article Google Scholar
van Griethuysen JJM, Fedorov A, Parmar C, Hosny A, Aucoin N, Narayan V et al (2017) Computational radiomics system to decode the radiographic phenotype. Cancer Res 77(21):e104–e107. https://doi.org/10.1158/0008-5472.CAN-17-0339
Article Google Scholar
Koo TK, Li MY (2016) A guideline of selecting and reporting intraclass correlation coefficients for reliability research. J Chiropr Med 15(2):155–163. https://doi.org/10.1016/j.jcm.2016.02.012
Article Google Scholar

Download references

Acknowledgements

None.

Funding

This work is in part funded by Walk-for-Beauty Foundation and Carol M. Baldwin Breast Cancer Research Foundation. None of the funding bodies participated in the design of the study, or collection, analysis, interpretation of data, or manuscript preparation.

Author information

Authors and Affiliations

Department of Biomedical Engineering, Stony Brook University, Stony Brook, NY, 11794, USA
Renee Cattell, Shenglan Chen & Chuan Huang
Department of Radiology, Stony Brook Medicine, Stony Brook, NY, 11794, USA
Chuan Huang
Department of Psychiatry, Stony Brook Medicine, Stony Brook, NY, 11794, USA
Chuan Huang

Authors

Renee Cattell
View author publications
You can also search for this author in PubMed Google Scholar
Shenglan Chen
View author publications
You can also search for this author in PubMed Google Scholar
Chuan Huang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors participated in the study design, data acquisition, manuscript preparation, data interpretation, literature review and summary. All authors approve the submitted version. Renee Cattell and Shenglan Chen contributed equally to this paper.

Corresponding author

Correspondence to Chuan Huang.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article

Cattell, R., Chen, S. & Huang, C. Robustness of radiomic features in magnetic resonance imaging: review and a phantom study. Vis. Comput. Ind. Biomed. Art 2, 19 (2019). https://doi.org/10.1186/s42492-019-0025-6

Download citation

Received: 16 August 2019
Accepted: 09 October 2019
Published: 20 November 2019
DOI: https://doi.org/10.1186/s42492-019-0025-6

Robustness of radiomic features in magnetic resonance imaging: review and a phantom study

Abstract

Introduction

Overview of radiomics

Radiomics in MRI

Overview

Head and neck cancer

Prostate cancer

Brain cancer

Breast cancer

Others

Steps of MRI radiomics

Feature robustness in MRI radiomics

Importance of robustness of features in medical imaging

Robustness analysis in MRI

Results and discussion

SNR

ROI delineation

Small voxel size variation

Limitations

Conclusions

Methods

Phantom MR imaging

Image segmentation

Image processing

Feature extraction

Robustness analysis

Availability of data and materials

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords