Skip to main content

Robustness of radiomic features in magnetic resonance imaging: review and a phantom study


Radiomic analysis has exponentially increased the amount of quantitative data that can be extracted from a single image. These imaging biomarkers can aid in the generation of prediction models aimed to further personalized medicine. However, the generalizability of the model is dependent on the robustness of these features. The purpose of this study is to review the current literature regarding robustness of radiomic features on magnetic resonance imaging. Additionally, a phantom study is performed to systematically evaluate the behavior of radiomic features under various conditions (signal to noise ratio, region of interest delineation, voxel size change and normalization methods) using intraclass correlation coefficients. The features extracted in this phantom study include first order, shape, gray level cooccurrence matrix and gray level run length matrix. Many features are found to be non-robust to changing parameters. Feature robustness assessment prior to feature selection, especially in the case of combining multi-institutional data, may be warranted. Further investigation is needed in this area of research.


Overview of radiomics

Radiomics is the extraction of high-dimensional and quantitative mineable data from digital medical images [1,2,3]. The prefix “radio-” refers to the use of radiological images; these digital medical images can come from various modalities, but are most frequently computed tomography (CT), positron emission tomography (PET) and magnetic resonance imaging (MRI) [1, 2]. Patients often receive numerous imaging studies to diagnose, stage, plan treatment and monitor disease progression. Currently in clinical practice, imaging data is only qualitatively or semi-quantitively utilized and a dictated report is created by the radiologist. Radiomic analysis aims to maximize the amount of quantitative information that can be extracted from the existing medical images that may not be appreciable to the naked eye, adding more valuable information that can be used for patient care. The digital image is analyzed by mathematical algorithms and/or filtering of the data to result in a quantitative value. These features are termed quantitative imaging biomarkers. These features can be classified into 2 different groups: semantic and agnostic.

Semantic features can be either qualitatively defined by a radiologist or quantitatively defined by a mathematical algorithm. Examples of semantic features include size, shape, location, vascularity, and spiculation [1, 2]. These are descriptors that are commonly used by radiologists in a qualitative fashion to identify and characterize disease, such as in the case of breast tumors where the size of tumor is indicative of treatment response (Response evaluation criteria in solid tumors criteria) and spiculation being a higher chance of malignancy (Breast Imaging Reporting and Data System) [1, 4,5,6]. Quantitative extraction of semantic features is desired to give a more comprehensive and reproducible description of the region of interest (ROI), whereas visual inspection by radiologist has large intra- and inter-reader variability [5].

Agnostic features aim to quantify the heterogeneity within a ROI based on image intensity. Agnostic features can be further broken down into first order features, second-order features and higher-order features:

First order features are commonly histogram-based and examine gray level signal intensity within a ROI independent of spatial relationships between adjacent voxels. Examples of these features include uniformity, entropy, mean, median and kurtosis [1, 2].

Second-order features, commonly referred to as “texture” features, examine spatial relationship between gray level signal intensities by constructing a gray-level dependence matrix [1, 2]. These features give a measure of intra-region heterogeneity. These were first explored by Haralick et al. [7] in the advent of gray level cooccurrence matrix (GLCM) by analyzing the occurrence of different gray level voxel pairs in different directions. Over the development of radiomics, these features have expanded to include different ways of quantifying spatial relationship between voxels, such as gray level run length matrix (GLRLM), which quantifies the number of consecutive voxels with same gray level [8], and gray level zone length matrix, which quantifies the size of a homogenous area of an image [9].

Higher-order features involve application of a filter or transformation to an image prior to feature extraction. These features aim to identify patterns or highlight details within the image that are not initially perceivable by the reader or are hard to interpret [1, 5]. An example of this type of feature is wavelet transform [10].

As such, this analysis has exponentially increased the amount of information that can be extracted from a single digital image. A single image may contain valuable sub-visual information of the tissue pathophysiology, phenotype and microenvironment that can be captured by quantitative analysis [2].

The suffix “-omics” refers to the combination of this massive amount of quantitative features that can be extracted from a single ROI using mathematical/statistical methods with clinical characteristics to be used in clinical management of patients [1, 2]. A goal of radiomics is to identify robust and consistent imaging biomarkers to aid in clinical decision making, such as the diagnosis of a disease, monitoring of treatment response or prediction of prognosis [1]. This is a step towards “precision” or “personalized” medicine in which these large number of quantitative features from the image of a specific individual coupled with their individual clinical characteristics (age, genomic profiling, etc.) can be used to tailor treatment or assess risk [1, 2, 5].

A large area of study in the field of radiomics include oncological applications, attributed to Quantitative Imaging Network, funded by National Institutes of Health and the Quantitative Imaging Biomarker Alliance, organized by the Radiological Society of North America [2, 5]. Cancer has been noted to be a highly heterogenous disease on both an inter-patient and intra-patient level [2, 11, 12]. There are many applications of radiomics in oncological applications. There is a need for a non-invasive imaging biomarker to better characterize lesions, such as tumor aggressiveness, because a single needle biopsy cannot capture the entire landscape of a tumor [5]. In the case of a more aggressive tumor, it is possible that a more intensive treatment regimen may be tailored to those patients resulting in an improved prognosis [11]. Additionally, characterizing a lesion as malignant or benign could be a useful tool for clinicians to make a more informed diagnosis, reducing stress for the patient and identifying the correct course of action. Furthermore, radiomic analysis could aid in the monitoring of treatment response; current criteria include mainly size and shape changes, whereas there may be subtle changes in the image appearance, not clinically appreciable to the naked eye, which is informative of response [5, 11]. It is possible, that in the case of a clearly non-responding tumor, the patient may be switched to a different/more effective therapy and avoid side effects associated with a treatment from which they are not expected to receive clinical benefit.

As previously mentioned, radiologic images including CT, PET and MRI have been used in radiomics studies. In this article, we focus on MRI. Each modality has its own characteristics which could affect the radiomic analysis. CT and PET have pixel/voxel values with a physical meaning, namely characterizing the x-ray attenuation of tissue through Hounsfield units and cellular activity through Standard Uptake Value, respectively. Thus, the diagnostic or prognostic implications resulting from radiomic analysis will have variable interpretations.

Radiomics in MRI


MRI is a commonly used modality for radiomic analysis owing to its’ rich contrast mechanisms (such as T1, T2, chemical exchange, diffusion, perfusion, contrast enhancement) and fine soft-tissue detail [13]. A majority of MRI radiomic analysis is performed in oncological applications such as head and neck, prostate, brain and breast cancer.

Head and neck cancer

Numerous studies have performed MRI radiomic analysis on head and neck cancer. Analyzed endpoints included pathological classification, segmentation and prognostic/predictive biomarkers of progression, survival or treatment, with reports of radiomic model performance showing promising results in most studies [13].

Prostate cancer

Multiparametric MRI is an important tool in the diagnosis of prostate cancer, with T2-weighted, dynamic contrast enhanced and diffusion weighted imaging being the core imaging sequences in the Prostate Imaging Reporting and Data System [14]. Detection of prostate cancer is the main focus of radiomics as it applies to prostate cancer, specifically with identification and delineation of the tumor region being the priority [15].

Brain cancer

MRI is a standard of care for brain tumors, most commonly in the form of the contrast-enhanced imaging which can identify tumor areas through their leaky vasculature and breakdown of the blood brain barrier. Main clinical applications of radiomics in brain cancer include prediction of prognosis (survival time), classification of glioblastoma subtypes and discrimination of radiation necrosis tissue from recurrent tumor tissue [16].

Breast cancer

MRI is the modality of choice for assessing extent of disease and monitoring treatment response in patients diagnosed with breast cancer. Similar to brain cancer, a dynamic contrast enhanced series is commonly performed to identify areas of increased, disorganized vascularity associated with malignancy. Studies performed have looked at differentiating benign from malignant lesions, prediction of treatment response, prediction of lymph node metastasis, prediction of molecular profile and prediction of risk of recurrence [17,18,19].


Aside from oncological applications, radiomic analysis has been explored in other pathologies such as Alzheimer’s disease, multiple sclerosis, ischemic stroke and epilepsy [20,21,22,23].

Steps of MRI radiomics

Radiomic analysis of MRI generally consists of 4 main steps: image acquisition, ROI segmentation, feature extraction and feature selection.

Image acquisition factors include scanner (make, model, field), coil, sequence [sequence type, echo time (TE), repetition time (TR), acceleration, voxel size, bandwidth, etc.] and reconstruction algorithm (parallel imaging, compressed sensing, regularization parameters, coil combination, etc.).

ROI segmentation includes automatic, semi-automatic or manual delineation of the ROI in the image.

Feature extraction includes pre-processing steps (normalization, binning to a defined number of gray levels) and application of mathematical algorithms or filters to calculate the feature within the ROI.

Feature selection and model construction includes reduction techniques to reduce the number of redundant features and selection by means of machine learning (least absolute shrinkage and selection operator, support vector machine, etc.).

Changing parameters at any steps in the process could result in different feature values, and thus lessen the consistent and reliable predictive performance. Although many of the parameters in this pipeline are easy to standardize, some of them suffer from more variabilities in MRI radiomics.

Feature robustness in MRI radiomics

Importance of robustness of features in medical imaging

A fundamental requirement to draw reliable conclusions based on any radiomics imaging biomarker is that its value must be stable under different conditions and two measurements obtained under the same conditions must be consistent [24]. There is currently no consensus on how to assess the robustness [25,26,27,28,29,30] (others may refer to it as “stability” [31,32,33,34,35,36],“reproducibility” [26, 37,38,39,40] or “repeatability” [24, 38, 41]) of radiomic features. However, it is recommended in image biomarker standardization initiative (IBSI) [42] to perform feature robustness assessment prior to feature selection. It should be noted that robustness is not a guarantee of the features’ discriminative power and the predictive performance should be investigated [24]. Moreover, feature robustness could be application dependent [43], meaning that a feature that is found to be highly precise for a certain dataset/disease could have poor stability when assessed for another dataset/disease. Several studies [24, 28, 32, 37] emphasized that feature pre-selection based on stability should be performed to generate more reliable results and reduce data dimensionality.

Robustness analysis in MRI

Most of the existing publications assessing image biomarker robustness investigated radiomic features from CT and PET images [30, 44,45,46,47,48]. It was stated in a review paper in 2016 [49] that “the repeatability of MR-based radiomic features has not been investigated”. Since then, there have been some studies in recent publications investigating the robustness of MRI radiomic analysis, but, due to lack of standardization, frequently leads to inconsistent conclusions. We performed a literature search on peer-reviewed full-text articles that analyzed feature robustness based on MRI and summarized them in Table 1 (16 on human subjects, and 5 exclusively on phantoms). These publications have assessed some parameters such as vendor [33, 40, 51], scanner [31, 33], acquisition parameters [52, 59], observers [26, 37, 39, 50] and pre-processing parameters [24, 38, 50, 53, 54], however, there still remains much to be investigated.

Table 1 Summary of literature for magnetic resonance imaging radiomics feature robustness

The importance of complete and clear reporting was also highlighted in several studies. IBSI [42] presented informative reporting guidelines on image pre-processing and feature extraction. Additionally, the radiomics quality score was proposed by the D-Lab [43]; this assigns a value based on 16 key points on the reporting of radiomics studies. With the aid of these two standards, it was found that many studies were lacking in the clear and concise description of (1) software implementation (i.e., chosen setting parameters, equations), (2) pre-processing steps (i.e., normalization, quantization) and (3) statistical methods use to quantify or assess feature robustness [i.e., form of intraclass correlation coefficient (ICC)]. Additionally, use of an external validation set is an important step to robustness feature analysis that was lacking in many of these studies.

We believe one option to improve robustness analysis of MRI radiomics studies is to systematically evaluate the behavior of the radiomic features under various conditions. With a well-defined “dictionary” of robust features, researchers can perform a pre-selection step based on their specific application. Here, we demonstrate such effort by evaluating feature robustness to MRI image signal to noise ratio (SNR), ROI delineation, small voxel size variation and normalization methods through a phantom study. The workflow of the study is displayed in Fig. 1. We measure degree of robustness using ICC (2-way mixed-effects model, single rater, absolute agreement) and separation into three groups based on ICC values: high (> 0.9), moderate (0.5–0.9) and low (< 0.5) for each of the conditions investigated.

Fig. 1

Schematic representation of workflow in this study. Image segmentation is performed manually on a single image. The ROIs are interpolated to images of different in-plane resolutions for voxel size analysis. Gaussian noise is added to generate different signal to noise ratio steps and generate 10 different noise realizations for test-retest analysis. Shape, first order, GLCM and GLRLM features are calculated for each ROI. GLCLM and GLRLM features are calculated after normalization (mean ± 3SD or zero to maximum) and discretization (64 gray levels). ROI Region of interest, GLCM Gray level cooccurrence matrix, GLRLM Gray level run length matrix

Results and discussion


In MRI, there are many factors affecting the SNR of an image even if all acquisition parameters are set to the same values and acquisitions are performed on the same scanner. Examples of these factors include coil load, analog-to-digital gain, shimming, reconstruction method and size of the patient. In fact, due to the inhomogeneity of coil sensitivity, SNR can even vary within the same slice of image. This can be due to both B1+ (transmit) and B1- (receiving) properties of the coil. In this study, we systematically evaluate the effect of several levels of SNR using phantom data with added Gaussian noise. We also analyze the effect of two normalization methods on the radiomic results.

T2 weighted phantom images used in the analysis are shown in Fig. 2a, with ROIs drawn on a pineapple core (red), banana (blue), orange (orange) and kiwi (green). Regions of interest used in SNR calculation are shown in Fig. 2b.

Fig. 2

Image of (a) regions of interest under investigation in this study, namely pineapple core (red), banana (blue), orange (orange) and kiwi (green), and (b) regions of interest used for signal to noise ratio calculation

Complex Gaussian noise was added to the original image (Fig. 3c) and magnitude images were used for the analysis. Two noise levels [SNR 45 (Fig. 3a) and SNR 75 (Fig. 3b)] were generated from the original image whose SNR is 124. To the naked eye, there isn’t a large visual difference between SNR of 45 and SNR of 75. These SNR levels are representative of those seen in clinical images. As mentioned above, SNR is spatially varying in MRI, the SNR values used here are simply representation of the overall noise level of the image.

Fig. 3

Magnitude images at different signal to noise ratio (SNR) steps: (a) SNR = 45, (b) SNR = 75 and (c) SNR = 124

Shape features were omitted from this part of the analysis because the same ROI was used across all SNR steps. This portion of the study aimed to analyze only the effect of added noise, and not intra- or inter-reader variability in ROI delineation. Details of the study is described in the Methods section, summarily, three most commonly used types of features (first order features, GLCM features, and GLRLM features) were studied using 10 different noise realizations and 2 different normalization techniques. Specifically, features within each group and their respective ICCs (2-way mixed-effects model, single rater, absolute agreement) are summarized in Table 2. The results using the first normalization technique (mean ± 3SD) are shown in Table 3 and Fig. 4a. The majority of first order features, 11 out of 13 have an ICC greater than 0.9, indicating high robustness to added noise. However, only 5 out of 22 GLCM features have an ICC greater than 0.9. A majority of the GLCM features (14 out of 22) were found to be of moderate robustness, represented by ICC between 0.5 and 0.9. All GLRLM features were found to have moderate robustness (0.5–0.9).

Table 2 Average of intraclass correlation coefficient value over 10 noise realizations in reference to variation in signal to noise ratio, region of interest dilation/erosion and small variation in voxel size
Table 3 Number of features of high, moderate and low robustness in each feature class, as defined by average of intraclass correlation coefficient over 10 noise realizations, in reference to signal to noise variation with normalization of mean ± 3SD or zero to maximum
Fig. 4

Average intraclass correlation coefficient over 10 noise realizations of first order, GLCM and GLRLM features by using (a) mean ± 3SD and (b) zero to maximum normalization for signal to noise analysis. ICC Intraclass correlation coefficient, GLCM Gray level cooccurrence matrix, GLRLM Gray level run length matrix

Second order texture features, namely GLCM and GLRLM, are impacted by the normalization procedure. The prior SNR analysis used mean ± 3 SD for normalization. Analysis was also performed by using zero to maximum normalization. Each method has its respective limitations. Mean ± 3SD normalization should be able to provide better separation due to a decrease in dynamic range, as compared to zero to maximum normalization making it more sensitive to small changes. However, mean ± 3SD is more likely to be sensitive to noise. Results using zero to maximum normalization procedure are summarized in Table 3 and Fig. 4b. First order features are not affected by normalization/quantization because they directly use all intensity value independently. As compared to the mean ± 3SD method, for GLCM features there is a trend toward higher ICC values, with no features in the low robustness group (ICC < 0.5). For GLRLM features, there is a similar trend, with higher proportion of features in the high robustness category (ICC > 0.9). As mentioned previously, Table 2 includes the full list of features and their respective ICC values. It is noted that in the ICC plots there is an observed clustering. It is hypothesized that these are because (1) a limited number of regions of interest are being compared, and (2) calculated features may be highly correlated.

ROI delineation

In practice, intra- and inter-reader variability in the manual segmentation of regions of interest is inevitable. Subjective determination of abnormal tissue may not be consistent across readers due to variables such as difference in experience or difference in contrast windowing. The effect of ROI dilation and erosion was also studied to evaluate feature’s robustness to ROI variations.

Two types of ROI manipulations were performed: dilation (by 1 pixel) and erosion (also by 1 pixel) as shown in Fig. 5. Similar to above, analysis was performed using 2 different normalization techniques: mean ± 3SD and zero to maximum.

Fig. 5

Dilation and erosion of region of interest (ROI), with the inner most (blue) ring being the eroded ROI, the center (red) ring being the original ROI and the outermost (green) ring being the dilated ROI for (a) pineapple core, (b) kiwi, (c) orange and (d) banana

For ROI erosion using mean ± 3SD normalization, results are summarized in Table 4 and Fig. 6a. All 10 shape features and 20 out of 22 GLCM features are found to be highly robust. However, only 10 out of 13 first order features and 6 out of 11 GLRLM features are found to be highly robust to ROI erosion. No feature is found to have an ICC less than 0.5. Results using zero to maximum normalization are summarized in Table 4 and Fig. 6b. By definition, first order and shape features are not affected by normalization differences. There is an upward trend in robustness of GLRLM feature, where all features are highly robust to ROI erosion using normalization method zero to maximum.

Table 4 Number of features of high, moderate and low robustness in each feature class, as defined by average of intraclass correlation coefficient over 10 noise realizations in reference to erosion of region of interest with normalization of mean ± 3SD or zero to maximum
Fig. 6

Average ICC over 10 noise realizations of first order, shape, GLCM and GLRLM features with (a and b) erosion of region of interest by one pixel with mean ± 3SD or zero to maximum normalization, respectively, and (c and d) dilation of region of interest by one pixel with mean ± 3SD or zero to maximum normalization, respectively. ICC Intraclass correlation coefficient, GLCM Gray level cooccurrence matrix, GLRLM Gray level run length matrix

For ROI dilation, mean ± 3SD normalization results are summarized in Table 5 and Fig. 6c. Shape is a highly robust feature. However, the other feature categories have relatively poorer robustness, with only 7 out of 13, 15 out of 22 and 7 out of 11 features with ICC greater than 0.9 for first order, GLCM and GLRLM groups, respectively. Table 2 lists individual features and their respective ICC values. Zero to maximum normalization results are summarized in Table 5 and Fig. 6d. There is an upward trend of ICC values using zero to maximum normalization method. Similar clustering is observed within ICC plots as described previously.

Table 5 Number of features of high, moderate and low robustness in each feature class, as defined by average of intraclass correlation coefficient over 10 noise realizations in reference to dilation of region of interest with normalization of mean ± 3SD or zero to maximum

As expected, dilation resulted in poorer robustness when compared to erosion. This is because dilation may incorporate tissue that is outside the ROI, whereas erosion still only includes voxels in the original ROI. It is noted that in our study dilation of the ROI may include "fruit skin", which can be highly different in visual appearance than the interior, or surrounding air. In non-phantom study, such as a ROI of a tumor, the overestimation or dilation of an ROI would likely include surrounding tissue and not surrounding air. However, there are tumors which are located next to air cavities, such as nasopharyngeal cancer, and robustness of features to dilation may be application based. The result of this comparison indicates that it may be more beneficial to be conservative when defining an ROI.

Small voxel size variation

In order to accommodate the different sizes of patients, it is a general practice for the technologist to adjust the field of view (FOV) on the fly without changing other parameters. Although strictly speaking, changing FOV will always affect some other parameters such as TE, bandwidth, gradient slew rate, which in turns affecting SNR. The effect of these small voxel size variations, and its relation to radiomic feature robustness, is understudied. In this part of the study, variation of voxel size was introduced by acquiring images with slight change of the FOV and matrix size. To remove effect of SNR variations caused by pixel size changes, all images were normalized to the same SNR. Previous studies have tried to solve this problem by performing interpolation, however, interpolation introduces other complications and affect feature robustness [27].

The same slice was acquired with 4 different in-plane resolutions of 0.47, 0.50, 0.56 and 0.67 mm as shown in Fig. 7a-d, respectively. All other parameters were kept the same when possible. The SNRs of individual images were normalized to an SNR level of 75 by adding Gaussian noise and 10 different noise realizations were performed numerically. Results with mean ± 3SD normalization are summarized in Table 6 and Fig. 8a. Even though minor voxel size variation will affect ROI, which in turn affects shape features, all shape features were found to be robust to minor voxel size variations. First order, GLCLM and GLRLM features groups are found to have 8 out of 13, 12 out of 22 and 6 out of 11 features, respectively, to be highly robust to small differences in voxel sizes. Individual feature ICCs are reported in Table 2. Results for zero to maximum normalization are summarized in Table 6 and Fig. 8b. Similar upward trends in ICC of GLCM and GLRLM are noted. Similar clustering is observed within ICC plots as described previously.

Fig. 7

Image of small variation in pixel size achieved by changes in acquisition parameters: (a) 0.47 mm, (b) 0.50 mm, (c) 0.56 mm and (d) 0.67 mm

Table 6 Number of features of high, moderate and low robustness in each feature class, as defined by average of intraclass correlation coefficient over 10 noise realizations in reference to pixel size with normalization of mean ± 3SD or zero to maximum
Fig. 8

Average ICC over 10 noise realizations of first order, shape, GLCM and GLRLM features with small variation in voxel size with (a) mean ± 3SD and (b) zero to maximum normalization for voxel size variation. ICC Intraclass correlation coefficient, GLCM Gray level cooccurrence matrix, GLRLM Gray level run length matrix

Small variability in voxel size does not result in a large visual difference, however differences are observed in radiomic feature extraction as reported here. Since small variation in voxel size can result in a reduction in robustness, it is expected that this result is even more concerning when comparing voxel sizes of larger differences. Especially in multi-institutional studies, it is common to see a large range of different voxel sizes used in analysis.


Our study has several limitations. Firstly, the results from phantom study cannot always be transferred to clinical studies. However, we note that robustness of radiomic features are application dependent and phantoms can still be used to investigate feature pre-selection pipeline. One way to show the transferability of phantom study is to compare the variability of each feature obtained from phantom to that calculated from tumors [60]. Secondly, we investigated only one sequence from one particular scanner. Although there are fundamental differences between scanners, inter-scanner variability could be addressed if the bias is corrected in image preprocessing step [51]. Lastly, we only investigated 2D radiomic features of certain classes. Future work should explore robustness of 3D features including filter-based features from multi-scanner images combined with clinical data.


Radiomic analysis is a step towards personalized medicine by an exponential increase in the amount of quantitative data that can be extracted from medical images. In current literature, feature robustness in MRI is understudied and feature extraction techniques are not universally standardized. There is a need for systematic evaluation of feature robustness. This is required to ensure that a predictive biomarker is reproducible and generalizable, especially across different institutions where parameters can be very variable. Application-based feature pre-selection step will be pivotal in anticipation for incorporation of radiomics-based tools in the clinic.


Phantom MR imaging

A pineapple, a gold kiwi, an orange, a banana and a strawberry placed on Styrofoam box served as radiomics phantom for our study. All images were acquired on a 3 T Siemens scanner (Biograph mMR) with a T2-weighted Turbo Spin Echo sequence using a 12 channel PET compatible head-coil. Acquisition parameters: echo train length = 18, TE = 98 ms, TR = 7360 ms, slide thickness/gapping = 2/0 mm, pixel bandwidth = 219 Hz, flip angle = 150 degree, 100% phase sampling, 100% phase FOV, body coil transmission, 1 average. Different axial resolutions were acquired by changing matrix size and FOV with parameters listed in Table 7.

Table 7 Voxel size, matrix size and field of view used in the voxel size variation analysis

Image segmentation

First, image segmentation was performed manually on one slice of Series 2 using ITK-SNAP (version 3.6.0; The ROIs on different fruits were then interpolated with linear method on the same slice of the rest of the series using MATLAB R2019a. To be conservative with ROI, threshold was set to 1. All interpolated ROIs were visually checked and corrected manually to exclude the fruit/air interface and discontinuities.

Image processing

In order to calculate the SNR of the original image the mean intensity of a homogenous region within a ROI (kiwi) is divided by the mean intensity of the background. These ROIs are shown in Fig. 2b. Because the mean of a Rayleigh distribution is \( \sqrt{\pi /2}\ \sigma \), where σ is the mode, the calculated SNR was further corrected by dividing \( \sqrt{\pi /2} \) . Complex Gaussian noise was added to the original image and magnitude images were used for the analysis. Two noise levels (SNR 45 and 75) were generated from the original image whose SNR is 124. Ten different noise realizations were performed numerically for each SNR level in order to identify the results with test-retest imaging. In-built MATLAB imdilate and imerode functions with a 3*3 stucturing element were used to dilate and erode ROIs. The entire preprocessing was implemented in MATLAB (MATLAB R2019a).

Feature extraction

A set of 56 features were extracted using an IBSI compliant in-house software (in MATLAB) partially adapted from the Vallieres radiomics toolbox [61] and ImFEATbox [62]. Features are summarized in Table 2. Thirteen of the features were first order statistics based, 10 were 2D shape based, while texture features were computed from the grey-level co-occurrence matrix (GLCM, 22 features) and grey-level run-length matrix (GLRLM, 11 features) merged from all four 2D directional matrices. The definitions of first order statistics based and texture features could be found in Parmar et al [63], while the definitions of 2D shape features could be found in Griethuysen et al [64]. Both first order and 2D shape features were directly implemented in MATLAB based on their definitions. For texture features, GLCM and GLRLM matrix computation and GLRLM feature extraction was adapted from the Vallieres radiomics toolbox, while GLCM features were adapted from ImFEATbox based on their definitions. Prior to calculating texture matrix, all images underwent intensity discretization to 64 levels based on IBSI recommendations, with intensity values rescaled by mean ± 3SD or zero to maximum intensity (to assess texture feature robustness on different discretization scales).

Robustness analysis

Feature robustness was assessed using ICC when performed at different SNR, different acquisition voxel size and ROI transformation, assuming these variations possess no consistent bias for different ROIs. Each noise level, voxel dimension and ROI transformation accounts for a rater and each intensity mask (containing intensities with selected voxels) accounts for a subject. Based on ICC reporting guidelines [65], ICC (2,1) was selected (“2-way mixed-effects model, single rater, absolute agreement”) as features are considered to be stable if their values remain the same across different variations. ICCs were calculated in MATLAB (MATLAB R2019a). For SNR and ROI dilation/erosion analysis, 5 ROIs were analyzed for a single image resolution (0.5 mm × 0.5 mm × 2.0 mm), with 10 different noise realizations, resulting in 50 samples per image. There were 2 groups being compared (SNR = 45 versus SNR = 75, original ROI versus eroded ROI, original ROI versus dilated ROI). For voxel size analysis, 5 ROIs were analyzed with 10 different nose realizations, resulting in 50 samples per image. These were analyzed across 4 different in-plane resolutions (0.47, 0.50, 0.56, 0.67 mm). ICC was assessed between groups for each calculated feature.

Availability of data and materials

All data will be provided upon written request.



1-nearest neighbor




Average correlation coefficient


Apparent diffusion coefficient


Concordance correlation coefficient


Computed tomography


Coefficient of variation


Dynamic contrast-enhanced


Differential subsampling with cartesian ordering


Dynamic range


Dice similarity coefficients


Diffusion-weighted imaging


Fluid-attenuated inversion recovery


Field of view


Gray level cooccurrence matrix


Gray level run length matrix


Image biomarker standardization initiative


Intraclass correlation coefficient


k nearest neighbor


Linear discriminant analysis


Laplacian of Gaussian


Magnetic resonance imaging


Not applicable


Number of acquisitions


Proton density weighted


Peak enhancement


Positron emission tomography


Probability of error


Receiver operating characteristic


Region of interest


Sampling bandwidth


Signal enhancement ratio


Signal to noise ratio


Echo time


Repetition time


Within-subject coefficient of variation


  1. 1.

    Rizzo S, Botta F, Raimondi S, Origgi D, Fanciullo C, Morganti AG et al (2018) Radiomics: the facts and the challenges of image analysis. Eur Radiol Exp 2(1):36.

    Article  Google Scholar 

  2. 2.

    Gillies RJ, Kinahan PE, Hricak H (2016) Radiomics: images are more than pictures, they are data. Radiology 278(2):563–577.

    Article  Google Scholar 

  3. 3.

    Kumar V, Gu YH, Basu S, Berglund A, Eschrich SA, Schabath MB et al (2012) Radiomics: the process and the challenges. Magn Reson Imaging 30(9):1234–1248.

    Article  Google Scholar 

  4. 4.

    Eisenhauer EA, Therasse P, Bogaerts J, Schwartz LH, Sargent D, Ford R et al (2009) New response evaluation criteria in solid tumours: revised RECIST guideline (version 1.1). Eur J Cancer 45(2):228–247.

    Article  Google Scholar 

  5. 5.

    Aerts HJ (2016) The potential of radiomic-based phenotyping in precision medicine: a review. JAMA Oncol 2(12):1636–1642.

    Article  Google Scholar 

  6. 6.

    Spak DA, Plaxco JS, Santiago L, Dryden MJ, Dogan BE (2017) BI-RADS® fifth edition: a summary of changes. Diagn Interv Imaging 98(3):179–190.

    Article  Google Scholar 

  7. 7.

    Haralick RM, Shanmugam K, Dinstein I (1973) Textural features for image classification. IEEE Trans Syst Man Cybernet SMC-3(6):610–621.

    Article  Google Scholar 

  8. 8.

    Galloway MM (1974) Texture analysis using grey level run lengths. Comput Graph Image Process 4(2):172–179.

    Article  Google Scholar 

  9. 9.

    Thibault G, Fertil B, Navarro C, Pereira S, Cau P, Levy N et al (2013) Shape and texture indexes application to cell nuclei classification. Int J Pattern Recogn Artif Intell 27(1):1357002.

    Article  MathSciNet  Google Scholar 

  10. 10.

    Mallat SG (1989) A theory for multiresolution signal decomposition: the wavelet representation. IEEE Trans Pattern Anal Mach Intell 11(7):674–693.

    Article  MATH  Google Scholar 

  11. 11.

    Davnall F, Yip CS, Ljungqvist G, Selmi M, Ng F, Sanghera B et al (2012) Assessment of tumor heterogeneity: an emerging imaging tool for clinical practice? Insights Imaging 3(6):573–589.

    Article  Google Scholar 

  12. 12.

    O'Connor JP, Rose CJ, Waterton JC, Carano RA, Parker GJ, Jackson A (2015) Imaging intratumor heterogeneity: role in therapy response, resistance, and clinical outcome. Clin Cancer Res 21(2):249–257.

    Article  Google Scholar 

  13. 13.

    Jethanandani A, Lin TA, Volpe S, Elhalawani H, Mohamed ASR, Yang P et al (2018) Exploring applications of radiomics in magnetic resonance imaging of head and neck cancer: a systematic review. Front Oncol 8:131.

    Article  Google Scholar 

  14. 14.

    Weinreb JC, Barentsz JO, Choyke PL, Cornud F, Haider MA, Macura KJ et al (2016) PI-RADS prostate imaging - reporting and data system: 2015, version 2. Eur Urol 69(1):16–40.

    Article  Google Scholar 

  15. 15.

    Sun Y, Reynolds HM, Parameswaran B, Wraith D, Finnegan ME, Williams S et al (2019) Multiparametric MRI and radiomics in prostate cancer: a review. Australas Phys Eng Sci Med 42(1):3–25.

    Article  Google Scholar 

  16. 16.

    Zhou M, Scott J, Chaudhury B, Hall L, Goldgof D, Yeom KW et al (2018) Radiomics in brain tumor: image assessment, quantitative feature descriptors, and machine-learning approaches. AJNR Am J Neuroradiol 39(2):208–216.

    Article  Google Scholar 

  17. 17.

    Crivelli P, Ledda RE, Parascandolo N, Fara A, Soro D, Conti M (2018) A new challenge for radiologists: radiomics in breast cancer. Biomed Res Int 2018:6120703.

    Article  Google Scholar 

  18. 18.

    Valdora F, Houssami N, Rossi F, Calabrese M, Tagliafico AS (2018) Rapid review: radiomics and breast cancer. Breast Cancer Res Treat 169(2):217–229.

    Article  Google Scholar 

  19. 19.

    Liu CL, Ding J, Spuhler K, Gao Y, Serrano Sosa MS, Moriarty M et al (2019) Preoperative prediction of sentinel lymph node metastasis in breast cancer by radiomic signatures from dynamic contrast-enhanced MRI. J Magn Reson Imaging 49(1):131–140.

    Article  Google Scholar 

  20. 20.

    Feng F, Wang P, Zhao K, Zhou B, Yao HX, Meng QQ et al (2018) Radiomic features of hippocampal subregions in alzheimer's disease and amnestic mild cognitive impairment. Front Aging Neurosci 10:290.

    Article  Google Scholar 

  21. 21.

    Zhang YY, Moore GR, Laule C, Bjarnason TA, Kozlowski P, Traboulsee A et al (2013) Pathological correlates of magnetic resonance imaging texture heterogeneity in multiple sclerosis. Ann Neurol 74(1):91–99.

    Article  Google Scholar 

  22. 22.

    Feng R, Badgeley M, Mocco J, Oermann EK (2018) Deep learning guided stroke management: a review of clinical applications. J Neurointerv Surg 10(4):358–362.

    Article  Google Scholar 

  23. 23.

    Kassner A, Thornhill RE (2010) Texture analysis: a review of neurologic MR imaging applications. AJNR Am J Neuroradiol 31(5):809–816.

    Article  Google Scholar 

  24. 24.

    Schwier M, van Griethuysen J, Vangel MG, Pieper S, Peled S, Tempany C et al (2019) Repeatability of multiparametric prostate MRI radiomics features. Sci Rep 9(1):9441.

    Article  Google Scholar 

  25. 25.

    Whybra P, Parkinson C, Foley K, Staffurth J, Spezi E (2019) Assessing radiomic feature robustness to interpolation in 18F-FDG PET imaging. Sci Rep 9(1):9649.

    Article  Google Scholar 

  26. 26.

    Baessler B, Weiss K, Pinto Dos Santos D (2019) Robustness and reproducibility of radiomics in magnetic resonance imaging: a phantom study. Investig Radiol 54(4):221–228.

    Article  Google Scholar 

  27. 27.

    Molina D, Pérez-Beteta J, Martínez-González A, Martino J, Velasquez C, Arana E et al (2017) Lack of robustness of textural measures obtained from 3D brain tumor MRIs impose a need for standardization. PLoS One 12(6):e0178843.

    Article  Google Scholar 

  28. 28.

    Zwanenburg A, Leger S, Agolli L, Pilz K, Troost EGC, Richter C et al (2019) Assessing robustness of radiomic features by image perturbation. Sci Rep 9(1):614.

    Article  Google Scholar 

  29. 29.

    Tanaka S, Kadoya N, Kajikawa T, Matsuda S, Dobashi S, Takeda K et al (2019) Investigation of thoracic four-dimensional CT-based dimension reduction technique for extracting the robust radiomic features. Phys Med 58:141–148.

    Article  Google Scholar 

  30. 30.

    Mori M, Benedetti G, Partelli S, Sini C, Andreasi V, Broggi S et al (2019) Ct radiomic features of pancreatic neuroendocrine neoplasms (panNEN) are robust against delineation uncertainty. Phys Med 57:41–46.

    Article  Google Scholar 

  31. 31.

    Saha A, Harowicz MR, Mazurowski MA (2018) Breast cancer MRI radiomics: an overview of algorithmic features and impact of inter-reader variability in annotating tumors. Med Phys 45(7):3076–3085.

    Article  Google Scholar 

  32. 32.

    Bologna M, Corino VDA, Montin E, Messina A, Calareso G, Greco FG et al (2018) Assessment of stability and discrimination capacity of radiomic features on apparent diffusion coefficient images. J Digit Imaging 31(6):879–894.

    Article  Google Scholar 

  33. 33.

    Peerlings J, Woodruff HC, Winfield JM, Ibrahim A, Van Beers BE, Heerschap A et al (2019) Stability of radiomics features in apparent diffusion coefficient maps from a multi-Centre test-retest trial. Sci Rep 9(1):4800.

    Article  Google Scholar 

  34. 34.

    Aerts HJWL, Velazquez ER, Leijenaar RTH, Parmar C, Grossmann P, Carvalho S et al (2014) Decoding tumour phenotype by noninvasive imaging using a quantitative radiomics approach. Nat Commun 5(1):4006. DOI: ARTN 4644.

  35. 35.

    Leijenaar RTH, Carvalho S, Velazquez ER, van Elmpt WJC, Parmar C, Hoekstra OS et al (2013) Stability of FDG-PET Radiomics features: an integrated analysis of test-retest and inter-observer variability. Acta Oncol 52(7):1391–1397.

    Article  Google Scholar 

  36. 36.

    Gudmundsson S, Runarsson TP, Sigurdsson S (2012) Test-retest reliability and feature selection in physiological time series classification. Comput Methods Prog Biomed 105(1):50–60.

    Article  Google Scholar 

  37. 37.

    Lecler A, Duron L, Balvay D, Savatovsky J, Bergès O, Zmuda M et al (2019) Combining multiple magnetic resonance imaging sequences provides independent reproducible radiomics features. Sci Rep 9(1):2068.

    Article  Google Scholar 

  38. 38.

    Fiset S, Welch ML, Weiss J, Pintilie M, Conway JL, Milosevic M et al (2019) Repeatability and reproducibility of MRI-based radiomic features in cervical cancer. Radiother Oncol 135:107–114.

    Article  Google Scholar 

  39. 39.

    Duron L, Balvay D, Vande Perre S, Bouchouicha A, Savatovsky J, Sadik JC et al (2019) Gray-level discretization impacts reproducible MRI radiomics texture features. PLoS One 14(3):e0213459.

    Article  Google Scholar 

  40. 40.

    Chirra P, Leo P, Yim M, Bloch BN, Rastinehad AR, Purysko A et al (2018) Empirical evaluation of cross-site reproducibility in radiomic features for characterizing prostate MRI. In: abstracts of SPIE 10575, medical imaging 2018: computer-aided diagnosis, SPIE, Houston, Texas, United States, 27 February 2018.

  41. 41.

    Gourtsoyianni S, Doumou G, Prezzi TB, Stirling JJ, Taylor NJ et al (2017) Primary rectal cancer: repeatability of global and local-regional mr imaging texture features. Radiology 284(2):552–561.

    Article  Google Scholar 

  42. 42.

    Zwanenburg A, Leger S, Vallières M, Löck S. Initiative for the IBS. Image biomarker standardisation initiative. 2016;

  43. 43.

    Lambin P, Leijenaar RTH, Deist TM, Peerlings J, de Jong EEC, van Timmeren J et al (2017) Radiomics: the bridge between medical imaging and personalized medicine. Nat Rev Clin Oncol 14(12):749–762.

    Article  Google Scholar 

  44. 44.

    Traverso A, Wee L, Dekker A, Gillies R (2018) Repeatability and reproducibility of radiomic features: a systematic review. Int J Radiat Oncol Biol Phys 102(4):1143–1158.

    Article  Google Scholar 

  45. 45.

    Zhovannik I, Bussink J, Traverso A, Shi ZW, Kalendralis P, Wee L et al (2019) Learning from scanners: bias reduction and feature correction in radiomics. Clin Trans Radiat Oncol 19:33–38.

    Article  Google Scholar 

  46. 46.

    Vuong D, Tanadini-Lang S, Huellner MW, Veit-Haibach P, Unkelbach J, Andratschke N et al (2019) Interchangeability of radiomic features between [18F]-FDG PET/CT and [18F]-FDG PET/MR. Med Phys 46(4):1677–1685.

    Article  Google Scholar 

  47. 47.

    Papp L, Rausch I, Grahovac M, Hacker M, Beyer T (2019) Optimized feature extraction for radiomics analysis of 18F-FDG PET imaging. J Nucl Med 60(6):864–872.

    Article  Google Scholar 

  48. 48.

    Forgács A, Béresová M, Garai I, Lassen ML, Beyer T, DiFranco MD et al (2019) Impact of intensity discretization on textural indices of [18F]FDG-PET tumour heterogeneity in lung cancer patients. Phys Med Biol 64(12):125016.

    Article  Google Scholar 

  49. 49.

    Yip SSF, Aerts HJWL (2016) Applications and limitations of radiomics. Phys Med Biol 61(13):R150–R166.

    Article  Google Scholar 

  50. 50.

    Traverso A, Kazmierski M, Shi ZW, Kalendralis P, Welch M, Nissen HD et al (2019) Stability of radiomic features of apparent diffusion coefficient (ADC) maps for locally advanced rectal cancer in response to image pre-processing. Phys Med 61:44–51.

    Article  Google Scholar 

  51. 51.

    Um H, Tixier F, Bermudez D, Deasy JO, Young RJ, Veeraraghavan H (2019) Impact of image preprocessing on the scanner dependence of multi-parametric MRI radiomic features and covariate shift in multi-institutional glioblastoma datasets. Phys Med Biol 64(16):165011.

    Article  Google Scholar 

  52. 52.

    Buch K, Kuno H, Qureshi MM, Li BJ, Sakai O (2018) Quantitative variations in texture analysis features dependent on MRI scanning parameters: a phantom model. J Appl Clin Med Phys 19(6):253–264.

    Article  Google Scholar 

  53. 53.

    Yang F, Dogan N, Stoyanova R, Ford JC (2018) Evaluation of radiomic texture feature error due to MRI acquisition and reconstruction: a simulation study utilizing ground truth. Phys Med 50:26–36.

    Article  Google Scholar 

  54. 54.

    Brynolfsson P, Nilsson D, Torheim T, Asklund T, Karlsson CT, Trygg J et al (2017) Haralick texture features from apparent diffusion coefficient (ADC) MRI images depend on imaging and pre-processing parameters. Sci Rep 7(1):4041.

    Article  Google Scholar 

  55. 55.

    Guan Y, Li WF, Jiang ZR, Chen Y, Liu S, He J et al (2016) Whole-lesion apparent diffusion coefficient-based entropy-related parameters for characterizing cervical cancers: initial findings. Acad Radiol 23(12):1559–1567.

    Article  Google Scholar 

  56. 56.

    Molina D, Pérez-Beteta J, Martínez-González A, Martino J, Velásquez C, Arana E et al (2016) Influence of gray level and space discretization on brain tumor heterogeneity measures obtained from magnetic resonance images. Comput Biol Med 78:49–57.

    Article  Google Scholar 

  57. 57.

    Savio SJ, Harrison LCV, Luukkaala T, Heinonen T, Dastidar P, Soimakallio S et al (2010) Effect of slice thickness on brain magnetic resonance image texture analysis. Biomed Eng Online 9:60.

    Article  Google Scholar 

  58. 58.

    Mayerhoefer ME, Szomolanyi P, Jirak D, Materka A, Trattnig S (2009) Effects of MRI acquisition parameter variations and protocol heterogeneity on the results of texture analysis and pattern discrimination: an application-oriented study. Med Phys 36(4):1236–1243.

    Article  Google Scholar 

  59. 59.

    Collewet G, Strzelecki M, Mariette F (2004) Influence of MRI acquisition protocols and image intensity normalization methods on texture classification. Magn Reson Imaging 22(1):81–91.

    Article  Google Scholar 

  60. 60.

    Mackin D, Fave X, Zhang LF, Fried D, Yang JZ, Taylor B et al (2015) Measuring computed tomography scanner variability of radiomics features. Investig Radiol 50(11):757–765.

    Article  Google Scholar 

  61. 61.

    Vallières M, Freeman CR, Skamene SR, El Naqa I (2015) A radiomics model from joint FDG-PET and MRI texture features for the prediction of lung metastases in soft-tissue sarcomas of the extremities. Phys Med Biol 60(14):5471–5496.

    Article  Google Scholar 

  62. 62.

    Liebgott A, Küstner T, Strohmeier H, Hepp T, Mangold P, Martirosian P et al (2018) ImFEATbox: a toolbox for extraction and analysis of medical image features. Int J Comput Assist Radiol Surg 13(12):1881–1893.

    Article  Google Scholar 

  63. 63.

    Parmar C, Rios Velazquez E, Leijenaar R, Jermoumi M, Carvalho S, Mak RH et al (2014) Robust radiomics feature quantification using semiautomatic volumetric segmentation. PLoS One 9(7):e102107.

    Article  Google Scholar 

  64. 64.

    van Griethuysen JJM, Fedorov A, Parmar C, Hosny A, Aucoin N, Narayan V et al (2017) Computational radiomics system to decode the radiographic phenotype. Cancer Res 77(21):e104–e107.

    Article  Google Scholar 

  65. 65.

    Koo TK, Li MY (2016) A guideline of selecting and reporting intraclass correlation coefficients for reliability research. J Chiropr Med 15(2):155–163.

    Article  Google Scholar 

Download references




This work is in part funded by Walk-for-Beauty Foundation and Carol M. Baldwin Breast Cancer Research Foundation. None of the funding bodies participated in the design of the study, or collection, analysis, interpretation of data, or manuscript preparation.

Author information




All authors participated in the study design, data acquisition, manuscript preparation, data interpretation, literature review and summary. All authors approve the submitted version. Renee Cattell and Shenglan Chen contributed equally to this paper.

Corresponding author

Correspondence to Chuan Huang.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Cattell, R., Chen, S. & Huang, C. Robustness of radiomic features in magnetic resonance imaging: review and a phantom study. Vis. Comput. Ind. Biomed. Art 2, 19 (2019).

Download citation


  • Radiomics
  • Robustness
  • Magnetic resonance imaging
  • Imaging biomarker
  • Phantom study