Multi-scale characterizations of colon polyps via computed tomographic colonography

Texture features have played an essential role in the field of medical imaging for computer-aided diagnosis. The gray-level co-occurrence matrix (GLCM)-based texture descriptor has emerged to become one of the most successful feature sets for these applications. This study aims to increase the potential of these features by introducing multi-scale analysis into the construction of GLCM texture descriptor. In this study, we first introduce a new parameter - stride, to explore the definition of GLCM. Then we propose three multi-scaling GLCM models according to its three parameters, (1) learning model by multiple displacements, (2) learning model by multiple strides (LMS), and (3) learning model by multiple angles. These models increase the texture information by introducing more texture patterns and mitigate direction sparsity and dense sampling problems presented in the traditional Haralick model. To further analyze the three parameters, we test the three models by performing classification on a dataset of 63 large polyp masses obtained from computed tomography colonoscopy consisting of 32 adenocarcinomas and 31 benign adenomas. Finally, the proposed methods are compared to several typical GLCM-texture descriptors and one deep learning model. LMS obtains the highest performance and enhances the prediction power to 0.9450 with standard deviation 0.0285 by area under the curve of receiver operating characteristics score which is a significant improvement.


Introduction
Colorectal carcinoma (CRC) is one of the top fatal diseases in the United States. American Cancer Society ranks CRC as the third most common cancer and the third leading cause of cancer-related deaths in both men and women [1]. There are two main categories of polyps, non-neoplastic and neoplastic. In general, the larger a polyp, the greater the risk of cancer is, especially with neoplastic polyps. Therefore, early polyp screening could effectively reduce the incidence of CRC [2,3]. Computed tomographic colonography (CTC) is a minimally-invasive, cheap and safe screening method for polyps. However, subtle lesion diagnosis from these CTC images is still very challenging even for radiologists [4][5][6]. Nevertheless, computer-aided diagnosis (CADx) via tumor heterogeneity has shown great potential to handle this challenge [7][8][9].
Tumor heterogeneity describes the observation that different tumor cells can show distinct morphological and phenotypic profiles. It has become a critical measure in benign and malignant differentiability. The lesion's heterogeneity is closely related to the lesion image textures (Fig. 1). However, texture pattern extraction remains a great challenge [10][11][12][13][14]. The method proposed by Haralick et al. [15], the gray-level co-occurrence matrix (GLCM)-based texture descriptor, is identified as a promising solution for this problem. GLCM-based textures have been a forerunner in this field and adapted to multiple diseases such as polyps, breast cancer, lung nodules, gliomas, bladder cancer, and imaging modalities including CT, magnetic resonance imaging, positron emission computed tomography [16][17][18][19]. In the past, Lam [20] extended gray level co-occurrence matrix (CM) by gradient magnitude to extract image textures and Guo [21] explored CM by Gaussian curvatures to construct shape descriptors. In the recent years, Song et al. [22] introduced some high order metrics, such as gradient magnitude and curvature, to expand Haralick features (HFs) in volumetric data for polyp classification. To further improve the distinctions of the Haralick measures in different directions, Hu et al. [23] used the Karhunen-Loeve transform (KLT) to map the Haralick measures into an orthogonal eigenspace.
The Haralick model defines and extracts some important texture patterns from images. These patterns reveal image intensity correlation for pixel pairs on each two-dimensional (2D) image slice. Nevertheless, descriptors computed using the Haralick model in the 2D presentation have certain limitations. The model analyzes the nearest neighboring pixel in four different directions which is described in Section 2. The HFs are often extracted to construct rotational invariant descriptors which are formed by the means and ranges of Haralick measures along those four directions. However, the potential drawbacks of four-directionalaveraging in a 2D digital image lack rotational robustness. On the other hand, the traditional Haralick model always counts all pixel pairs and calculates their distribution over all slices by full sampling which could result in redundant information and weaken the model's performance. The third shortcoming of the Haralick model is the consideration of the nearest neighboring pixel to construct texture features: not considering other displacements may limit the potential to further extract textural patterns.
In this paper, we modify the definition of GLCM by adding a new variable -stride, and introducing multiple scaling analysis into the texture descriptor construction via GLCM. To address the weaknesses of the Haralick model, three schemes associated with each of the variables in GLCM, i.e., displacement, stride and angle, are devised to evaluate the CM-texture descriptors. Each scheme seeks to increase texture patterns through multiple scaling analysis while being mindful of texture information redundancy associated with the learning method. Furthermore, we intend to find out which variable would be more sensible in a multi-scale framework. Six classification schemes are designed for our investigation by random forest (RF).
The remainder of this paper is organized as follows: Section 2 describes and reviews the baseline Haralick model and proposes our new adaptive sampling model. Section 3 includes the analysis on the design and the results of our method. The last section includes some discussions and conclusions.

Methods
This section begins with a review of the basic Haralick model in 2D and 3D space. Then, the proposed multiple scaling gray-level co-occurrence model (MSGLCM) is presented. The individual parameters from MSGLCM are each evaluated independently in three learning models, namely by multiple displacements, multiple strides and multiple angles.
2D/3D Haralick model The Haralick model was proposed to extract polyp texture information from intensity images because of its strong ability to discriminate polyp pathologies [22,23]. This model's pipeline includes calculating image metrics such as its intensity, gradient and curvature, etc., image metric digitalization, GLCM computation, Haralick measure and feature definition, and image descriptor construction. The GLCM computation defines and extracts important texture patterns (distribution of pixel-pairs) from one image along different directions (Fig. 2).
The method provides 14 measures for every matrix computation. In a 2D gray image, four directions (0°, 45°, 90°and 135°) are analyzed (Fig. 2a). From each direction, one image would generate HFs consisting of 28 texture variables, i.e., 14 means and 14 ranges which would be used to construct the texture descriptor. In contrast, the number of directions in volumetric data is 13 (Fig. 2b). Hu et al. [23] expanded this model to generate 30 measures, referred to as the extended Haralick measures (eHM), to capture more texture information from volumetric data. Unlike the Haralick model, they employ all measures to form the texture descriptor instead of HFs.  The proposed method utilizes three primary variables for the multi-scaling model, i.e., displacement scaling, stride scaling, and angle scaling. Using these values, the equation for the multi-scaling GLCM (MSGLCM) can be presented as below: I represents the grayscale image, (M, N) is the image size, i and j are a pair of image pixel values, d is the displacement between two pixels along the angle θ, and s represents the stride. A pictorial illustration of Eq. (1) is shown in Fig. 3. When s = d = 1, the MSGLCM should be the traditional GLCM model. This model provides a new tool to capture more texture patterns at multiple scales. A typical example of MSGLCM calculation is shown by Fig. 3 where the stride is equal to 5. The MSGLCM model for 3D volumetric data is similar to the 2D model except that its coefficients are bidirectional.
According to MSGLCM definition, there are three important variables in the learning model, i.e., displacements, strides and angles. Each variable would be investigated individually and expanded to larger magnitudes to determine their individual behavior in the model. The following subsections present the methods where each of these three parameters are investigated for the contribution to the multi-scaling framework.

Learning model by multiple displacements
The traditional Haralick model has a sampling distance of 1. In medical images, there are more complex textures and using a displacement of 1 might limit the information used to define texture patterns. To evaluate the effect of displacement on the texture pattern, the other two coefficients, i.e., angle and stride, are fixed as follows: In learning model by multiple displacements (LMD), we adopt an up-sampling method to get more texture patterns. Considering the small volumes of the polyps, large displacements are not ideal while calculating MSGLCM. Smaller displacements, i.e., 1, 2, 3, are used in this exploration study (Fig. 4).
The calculation produces three matrix sets for three displacements. Each matrix set contains 13 matrices associated with 13 digital angles [23]. This method generates more texture patterns and texture descriptors for polyp classification compared to the traditional Haralick model.

Learning model by multiple strides
With the increased information that can be extracted with the MSGLCM model compared to the traditional method, the stride can be used as a form of down sampling to control multiple scaling implements while calculating the CM. Suppose the current position is (x, y); the next position for the model would be (x + stride, y) in the row, or (x, y + stride) in the column. A similar technology can be found in deep learning [24,25]. In this scheme, stride is the variable which is kept for evaluation while the displacement and angle are constants as described by the following equation.
where d 0 is the fixed displacement and θ 0 has 13 alternatives as shown in Formula (2). This method is similar to LMD with the addition of the stride analysis for the MSGLCM calculation. Unlike displacement, increasing the stride will lead to a downsampling process. Likewise, smaller strides are considered more ideal for MSGLCM calculation since the sizes of the polyps are always small. The strides that are evaluated for this method will be limited by the size of the region of interest (ROI) volumes used. The base model In image classification, the performance is significantly determined by some key features. The traditional full sampling will generate more redundancy while decreasing the ratio of key features, which will hurt the clustering performance. This method provides a solution via decreasing the sampling frequency over the image to lessen the number of non-critical features. Therefore, the down-sampling method intends to enhance the roles of key features in polyp classification to improve the clustering results.

Learning model by multiple angles
Angle sampling rate in a 3D image array can mitigate sparse directions in the model by including higher orders of neighbors in CM. The angles in digital images or volume data are discretized and as a result, increasing the digital angles requires more displacements in the digital domain. Similar to the previous designs, i.e., LMD and learning model by multiple strides (LMS), the displacements used to evaluate the new design are 1, 2 and 3 due to concern for polyp size (i.e., 3 mm and larger).
The following equation describes the MSGLCM with variable angles and a fixed stride.
where θ is the digital angle represented by some 3D vectors similar to Formula (2). It is easy to see this is an up-sampling model similar to LMD. Furthermore, LMD is a subset of learning model by multiple angles (LMA). Each displacement could generate a set of angles (Fig. 6a). The angles of different displacements are listed in Table 1. To examine the behavior of multiple angles, the displacement and the stride will be set as 1 for the base model. Further observation will include increasing the displacement and increasing the stride. Note that some angles in digital images can be duplicated as we include more directions while increasing displacements (Fig. 6a). To investigate the impact of these repeated angles in polyp classification, they are removed in another scheme, as shown in Fig. 6b.
All the proposed models could be able to generate new texture information different from the traditional Haralick model via multi-scaling on displacements, strides and angles. However, with the increased pool of information, the texture patterns would bring not only more useful information but also some redundancies. This can potentially lead to overfitting problems in polyp categorization which could lower the clustering performance and consequentially hurt the classification. There are numerous debates on this topic which could be solved by appropriate feature selection methods [26][27][28].

Polyp descriptors and classifier Polyp descriptors
Polyp descriptors are numeric descriptions in the form of scalars, vectors, or matrices that describe a polyp extracted from a polyp image or volume. In this article, the eHM are utilized to construct the polyp descriptors [23]. For MSGCLM, eHM defines 30 measures that expands the 14 traditional Haralick measures with 16 new measures. However, the 21th measure which represents cluster average is always equal to 0, and the 25th and 30th measures are equivalent after formula simplification. Therefore, the descriptor will include 28 measures for one direction. For multiple angles, the vector will have N * 28 variables to represent a polyp where N is the angle number.

Classifier and feature selection
Classification is one of the most effective tools for identifying descriptors. Its major task is to identify general patterns belonging to one category. The simplest case is binary classification which creates a function g : x → {1, −1}, where g is a classifier [29].
RF classification is derived from a decision tree (DT) method [30]. Unlike DT, RF will apply many trees to train and test the samples, then a voting method is used to get the probability from these trees. Another distinction is the random sampling in the tree construction that includes randomly splitting features, combinations  of features and choosing the threshold. For each process of RF, the descriptors of all polyps are divided into training groups and testing groups. Before classification, we first calculate the priority of each variable in the texture descriptor. GINI coefficient is introduced to be the priority measurement in our method. Then some variable sets are generated using the forward step feature selection method on the ranked variables [23,31]. Thereafter, classifications are performed on each variable set under the parameter of 2000 trees and ffiffiffiffiffiffiffiffiffiffiffi ffi NÃ28 p candidate variable number. We utilize the area under the curve of receiver operating characteristics (AUC) to be our evaluation measurement. The feature set with the highest AUC score would be taken to be the optimized texture descriptor.

Some operations for volume of interests and digital angles
Before differentiating the types of polyps, each polyp's position (x, y, z) in a volumetric data was labelled by radiologist experts. Next, a semiautomatic performance is adopted to crop the polyp patches on every image slice. For that purpose, the labeled polyps are outlined manually to generate ROIs on all slices according to the labelled location. The polyp locations are continuous: located on every slice and form a volume of interest (VOI). Due to the manual labelling, the resulting VOIs include additional information such as air. To separate the air from the polyp, an adaptive air-cleansing algorithm is employed to eliminate those voxels that contain predominately air [32].
The digital angles are defined in accordance to the grid structure of a digital image. As a result, the distance for each digital angle may not always be integers. To address this issue, vectors are used to provide information on angle and magnitude which correspond to the angle and displacement in the proposed models.
The traditional GLCM is calculated including the inverse angles to produce a symmetric matrix. The Haralick measures are symmetrically invariant; therefore, the matrix and its symmetric iteration can produce the same measures. To reduce redundant measures, the inverse directions are excluded from the digital angles. Only four angles are included in the 2D Haralick model corresponding to (1, 0), (1, 1), (0, 1), and (1, − 1). Similarly, the extended Haralick model in 3D would include more directions with the increase of the displacement.

Polyp dataset
A private dataset containing 59 patients with a total number of 63 polyp masses is used for the experiments of this study. All the polyp masses are at least 30 mm in diameter. Each polyp was identified by radiologist experts on CTC and optical colonoscopy. All the patients were scheduled for surgical removal intervention after detection and confirmation. When the polyp masses were removed, all pathology reports were obtained to verify whether each of the polyp masses was indeed a cancerous (adenocarcinoma) or benign (adenomatous) polyp. The breakdown of the dataset can be seen in Table 2. To benefit surgical intervention, it is important to know the malignant risk of each polyp mass. Given the pathology reports, these polyp CTC scans provide an excellent database to develop machine learning strategies to predict adenocarcinoma for more aggressive removals. In addition to direct clinical impact, this database also provides good opportunities to evaluate different machine learning strategies regarding to pathological ground truth. This study is an example of evaluating methodology development for polyp classification using the pathologically approved database.
Classification needs two sub-datasets, the training dataset and the testing dataset. From the polyp mass database, we randomly selected 15 samples from the benign polyps and 16 from the malignant polyps for training. The remaining polyps are used for testing. Thus 31 polyps are used for training and 32 for testing (Table 3). Repeating the random sampling method, we generated 100 unique iterations for training and testing groups.

Experimental outcomes
According to the three multi-scale models, six testing schemes are designed. We test the three models separately. Then three hybrid experiments are designed and implemented.  Table 4. Their results tell us that the stride is more effective to improve the descriptor distinction than the other two parameters of displacement and angle since its AUC score is improved by about 6%. Compared with eHM (baseline), the LMD and LMA are almost even and do not bring much gain for polyp classification when we change the displacement and the angle numbers independently.

Hybrid results of LMD + LMS
In this experiment, LMD and LMS are combined to extract some new texture patterns. There are 3 different displacements and 9 strides involved in texture descriptor construction. Hence, 27 kinds of polyp descriptors are generated. The number of angles is kept constant with 13 directions. After training and testing via RF, their AUC scores are calculated and illustrated in Table 5. It retells us that the stride is more sensitive than the displacement. With the stride increasing, we see that the classification performance generally increases in a staggering fashion. However, the trend of AUC scores on each row are gradually declining while the displacement is growing which means LMD introduces more redundant information for polyp classification. From these results, there is an increasing trend with odd strides as the stride approaches 7. Similar outcomes are reached with the increasing displacements. These outcomes demonstrate that the LMA model with a larger stride could produce more critical features and get much better performance for polyp classification. Figure 7 is plotted to illustrate the AUC score changes according to stride and displacement.

Hybrid results of LMS + LMA
In this experimental scheme, we try to combine LMS and LMA to investigate the second hybrid model with three parameters. The angle sampling method in Fig. 6 shows that more digital angles need more displacements which determines the digital angle number under full sampling. That means angle group and displacement exists in a one-to-one relationship under a full sampling scheme in digital images. Therefore, this type of hybrid model contains two parameters, i.e., angle number and strides. Moreover, the previous schemes indicate that the displacement does not obtain any benefit, while the stride produced significant impact on the AUC score. Therefore, the following scheme keeps 3 displacements while the stride varies from one to nine. The results of the scheme with duplicate angles are described in Table 6 and the scheme without duplicate angles is described in Table 7.
The best performing model follows the same convention from the previous model with an AUC of 0.9450 for angle = 13 and S = 7. However, the average AUC tells us that this angle group is not stable with the stride varying, as shown in Tables 6 and 7. Considering the stability of the model, the group with 62 angles shows some advantages over others. Its averaged AUC score reaches 90.55% with the smallest standard deviation 0.0378. The results of 62 angles also indicate that the 1st and 2nd nearest neighbors contain more distinctive texture descriptors while the third nearest neighbor brings  redundant information which hurts the classification performance to a small extent. Compared to the first row in Table 5, the AUC scores improved about 1%-2% with increasing directions. The multi-stride continues enhancing this criterion to 94% when s = 7.

Comparisons
To illustrate the efficiency of our method, some typical methods are introduced to compare with our models. These methods are listed as the following.
HFa typical method was proposed to construct the texture descriptor consisting of 28 HFs extracting from CM [15]. eHMa new model introduced 30 measures to represent texture characteristics extracted from CM [23].
K-L transform based eHM (eHM + KLT)this method introduced K-L transform to enhance the distinction between two different image features and reduce variation [23]. Co-occurrence of local anisotropic gradient orientation (CoLIAGe)this model employed gradient angles and extracted the entropy of every local patch to form a global texture descriptor by two joint histograms [33]. VGG16this method extracts 20 salient slices from every polyp volume to feed to VGG16 for polyp classification [34].
We choose two results from two hybrid results of our method for this comparison, i.e., LMD + LMS with stride = 7 and D = 1, LMS + LMA with angles = 62 and stride = 7. Their receiver operating characteristic curves are plotted in Fig. 8 which illustrates their different performances for polyp classification. Moreover, their AUC    scores, accuracies, sensitivities and specificities are also listed in Table 8 for further evaluation. To verify their differences, P-values are calculated using t-test to determine if our method is significantly different from others, as shown in Table 9. All the P-values are far smaller than 0.05 which indicates that the proposed methods are distinctive to all the typical methods.

Conclusions
This paper reviews the properties and evaluates the potential of the Haralick model by examining several weaknesses observed in practice [22,23]. The multi-scale gray level co-occurrence matrix (MSGLCM) is proposed and aims to improve this model by incorporating the multi-scale analysis technique with GLCM to evaluate the three variables: displacement, stride, and angular directions. MSGLCM combines the stride and down-sampling technology to emphasize the unique features and lessen the number of non-critical characteristics to improve polyp classification performance. Meanwhile, MSGLCM adopts up-sampling techniques to integrate the displacement and angle to get new texture patterns which could mitigate the sparse sampling problem within the GLCM calculation. With the increase of texture patterns and texture descriptors, the forward step feature selection method is applied to solve the informative redundancy and overfitting issues in polyp classification over 63 polyp masses: including 32 invasive adenocarcinomas and 31 benign adenomas. Experimental results reveal that increasing stride can significantly improve polyp classification over the traditional HF and eHM. On the other hand, displacement has little if any positive effect on the results on its own. With the addition of increasing displacements whilst preserving the lower displacements, there were varying results which demonstrates that there can be potential gains in additional displacements. This proposed model can achieve higher AUC values compared to the typical methods discussed in section 3.3. The best model from our experiments had a 6.23% improvement and reduced the standard deviation by 34.95% which is a significant advantage over them.

Discussion
Why the stride is more sensitive than the displacement and angle is still a question for us. The reasons might be guessed from two aspects. The type of polyp texture might be the first reason. The polyp texture should belong to one type of stochastic texture which has no apparent textural structures [35][36][37]. This type of texture is not sensitive to the changes of directions and displacement because of its isotropy. The second reason might be informative redundancy introduced by multiple angles on stochastic texture. The multi-angle sampling produced too many similar texture patterns from polyps.
Since the classification and recognition should depend on some unique features which should play a key role in it, these unique features always make up a very small proportion in the whole feature space [38]. The traditional full-sampling or up-sampling technique makes its proportion much smaller. The stride seems to lessen the   non-critical texture patterns and improve the ratio of unique features while down-sampling. We found that the higher sampling rate from a low stride may drown out those texture patterns that are necessary for distinguishing between pathologies.
In summary, our proposed model has shown encouraging performance. Nevertheless, redundant information and over-fitting issues by descriptor variables still face great challenges which need new feature selection technologies to solve. Further investigation is required in implementing convolutional neural networks to solve polyp classification on a small database [25]. Both of these are important tasks for our research efforts in the future.