Vector textures derived from higher order derivative domains for classification of colorectal polyps

Textures have become widely adopted as an essential tool for lesion detection and classification through analysis of the lesion heterogeneities. In this study, higher order derivative images are being employed to combat the challenge of the poor contrast across similar tissue types among certain imaging modalities. To make good use of the derivative information, a novel concept of vector texture is firstly introduced to construct and extract several types of polyp descriptors. Two widely used differential operators, i.e., the gradient operator and Hessian operator, are utilized to generate the first and second order derivative images. These derivative volumetric images are used to produce two angle-based and two vector-based (including both angle and magnitude) textures. Next, a vector-based co-occurrence matrix is proposed to extract texture features which are fed to a random forest classifier to perform polyp classifications. To evaluate the performance of our method, experiments are implemented over a private colorectal polyp dataset obtained from computed tomographic colonography. We compare our method with four existing state-of-the-art methods and find that our method can outperform those competing methods over 4%-13% evaluated by the area under the receiver operating characteristics curves.


Introduction
Imaging tissue textures have become a widely researched topic in the field of medical diagnosis within recent years. As machine learning methods have grown through more powerful computers and computational algorithms, the research on tissue textures, particularly the lesion textures, has grown rapidly in the field because of the ever-expanding number of medical applications. One of the preliminary applications for texture analysis is lesion classification based on the heterogenous characteristics of the image contrast within and surrounding the lesion [1,2]. Many studies have demonstrated that textures should be an important expression for the heterogeneities of medical images which describe some distinct morphological and phenotypic profiles and has become a critical measure in benign and malignant differentiability [3][4][5][6][7][8]. Using some texture measures computed from medical images, the tissue textures have demonstrated a powerful ability for computer-aided detection and diagnosis across a spectrum of diseases [9][10][11]. Among the most popular texture measures, some common examples include the gray-level co-occurrence matrix (GLCM) measure [12], gray-level run-length features [13], and the first order statistics features [14].
Research on tissue textures has shown the importance of the image intensity gradient within medical images [14][15][16][17]. By using the derivative operator on these medical images, the gradient information can be effectively encoded within the texture. The most notable examples of using the gradient information for textures are the histogram of oriented gradient (HOG) features [18] and the co-occurrence of local anisotropic gradient orientations (CoLIAGe) features [19]. The HOG features aim to bin the orientations of the gradients and use each

Open Access
Visual Computing for Industry, Biomedicine, and Art *Correspondence: jerome.liang@sunysb.edu histogram bin as an input feature for classification. The CoLIAGe features aim to compute the local entropy using the co-occurrence matrix (CM) of patches of voxels and the gradient angular information. The computed local entropy values are binned in a histogram, and then the histogram bins are used as input features for classification. These two methods have demonstrated a way of using the directional information in the gradient domain and coding the information into texture measures. An alternative strategy was explored to investigate what additional information the higher order derivative images can provide beyond the original intensity image [20]. The idea is to magnify the original image contrast distribution at different orders for different texture patterns, aiming to encode as much information as possible about the lesion heterogeneity into quantitative texture measures. By using the first-and second-order derivatives to obtain the corresponding gradient and curvature images, exploratory studies [20,21] provided an alternative way of encoding the higher order derivative image information into texture measures like GLCM.
While the above feature extraction methods have shown their potential in applications, they have their own limitations. For example, HOG method lacks any operation to extract information from the neighboring voxels that a CM-based method can provide. While the CoLIAGe method uses the CM, it limits itself only to the entropy values and considers only the local patches. The previous methods [20,21] limit their use of GLCM to only the magnitude of the higher order images.
To address the limitations of the above methods, we further explore the use of higher order derivative information for polyp description and classification via computed tomographic colonography (CTC). Colorectal cancer (CRC) is one of the leading causes of cancer related deaths worldwide and accurate diagnosis remains a challenging task for radiologists [22]. While standard endoscopic colonoscopy is still the most popular screening method for CRC, CTC has grown to become a viable screening option to detect and diagnose both the precursor polyps and cancerous lesions. CTC is a non-invasive procedure which often makes it more palatable for patients, but unlike endoscopic colonoscopy, it cannot resect any concerning polyps. Accurate diagnosis through imaging textures of these polyp masses can better assist physicians to determine a plan of treatment while reducing costs for biopsy and pathology procedures [23], though computeraided diagnosis of CTC polyps have been studied [21,24,25]. We focus on a dataset of polyp masses, or those polyps which have grown to be larger than 30 mm. These polyps require surgical intervention to remove, and the choice for how aggressively surgeons may cut into the colon for removal is determined by the malignancy of the lesion.
In this study, we look forward to expand upon the vector model introduced in ref. [26] to more deeply evaluate the image textures generated from the first-and second-order derivative information. Two differential operators, i.e., the gradient and Hessian operators, are employed to extract six local geometric measures, three of which are from the gradient operator and the other three are from the Hessian operator. The three local geometric measures of gradient operator are utilized to construct an angle-based and a vector (including both angle and magnitude)-based vector texture images (VTIs). Similarly, three local geometric measures of Hessian operator are utilized to construct an angle-based and a vector-based VTIs. From each VTI, a vector-based cooccurrence matrix (VCM) is proposed to compute a twodimensional (2D) texture pattern along an angle through the VTI space, similar to the application of GLCM to the intensity images [12]. A series of 2D texture patterns is then obtained along different angles and called VCMs hereafter. From the computed series of 2D texture patterns or VCMs, several texture features can be derived and then fed to a classifier to perform the polyp classification.
The remainder of this study is organized as follows. Methods/experimental section describes the methods used to generate the new vector texture features (VTFs) and Results and discussion section presents the results obtained from all classification experiments using these new VTFs. Discussions and conclusions are drawn in last section.

Methods/experimental
To overcome the challenges of limited soft tissue contrast from CT images, we first utilize the derivative operator to magnify the contrast. Then four VTIs are generated, two of which are from the gradient operator and the other two are from the Hessian operator. By applying the vector-based CM or VCM operator to each VTI, a series of VTFs are obtained and fed to a classifier. Figure 1 shows the flowchart of this work to outline the methods used to generate the derivative images, convert them into the associated vectors, and then input into a CM to form a set of vector-textures.

Vector texture definition in gradient domain
Suppose a scalar function or intensity image I = I(x, y, z) in three-dimensional (3D) space. Its gradient would be represented by the following vector function: where ∂I x , ∂I y , and∂I z are three partial differential parts. Since the intensity image is discrete and not continuous, we employ the well-established Sobel Operator kernels [27] to acquire the partial derivatives of the image. It is difficult to describe the object using the coordinates directly since the coordinate is a relative metric which (1) �I = ∂I x , ∂I y , ∂I z could be changeable under different geometric transforms [28,29]. Fortunately, a vector could be expressed by the geometric metrics of magnitude and direction. In 3D space, the direction of the vector has two components, the azimuth angle and polar angle, as shown in Fig. 2. To calculate the magnitude and direction of a gradient vector, a common method used is to translate the vector from Euclidean coordinates to spherical coordinates as follows: where | I| is the magnitude, θ g is the azimuth angle and φ g is the polar angle. Their definitions are given below: (2) ΔI = I x , I y , I z = |ΔI| cos g cos g , sin g cos g , sin g Unlike the CoLIAGe approach [19], where the azimuth and polar angles are used individually to extract texture features, we treat these two angles together as a description of gradient angle vector (GAV) of unit magnitude as: where GAV only contains partial information of gradient assuming unit magnitude. In addition, we further employ the magnitude and direction to compose a total gradient vector (TGV), which preserves all gradient information: Our next step is to quantify the Formulas (6) and (7) to generate two VTIs. By their definitions, the range of the azimuth θ g is [0, 2π ) , and the range of the polar angle φ g is in [0, π) . Their digitalization could be realized by: where Q a g and Q p g denote their gray levels, respectively, and ⌊X⌋ represents the infimum of X.
The range of the gradient magnitude tends to be very wide, however most of that information is clustered within a narrow region. This can be seen in the distribution of magnitudes for our polyp dataset in Fig. 3b. Moreover, the CM is very sparse when computed from the unbalanced value distribution. A non-uniform sampling based on histogram equalization [30] can be a solution, which would generate very flat histogram as shown in Fig. 3c. This type of sampling methods treats every image component or region where | I| is the gradient magnitude of the original gradient image and t is an integer. Then a uniform gray scaling method is applied on I g by: where I g was defined before as the re-mapped gradient magnitude, I max g and I min g are the maximum and minimum of I g , and Q m g is its maximum gray level number. Its histogram is shown in Fig. 3d.
After the quantification via Formulas (8, 9, 10 and11), we obtain two VTIs, corresponding to the definitions of GAV and TGV in the gradient image domain, as follows

Vector texture definition in Hessian domain
In mathematics, Hessian operator of a scalar function or intensity image I = I(x, y, z) in 3D space could be defined by a 2 nd order derivative matrix as the follows: where I xx , I xy , I xz , I yy , I yz , and I zz are the 2 nd order derivatives of I(x, y, z) . To compute these partial derivatives of the polyp images, we use the well-established Deriche filters [31] with parameter α = 1.
Due to the number of unique variables in the Hessian matrix we use matrix decomposition to reduce the dimensionality and extract the three eigenvalues Again, the Hessian angle vector (HAV) is defined as: The magnitude, azimuth angle and polar angle compose the total Hessian vector (THV) given by: Similar to the presentation in the gradient domain, we use the same method to perform the digitalization of θ h and φ h in the Hessian domain by: where Q a h and Q p h denote their gray levels or quantification numbers, respectively, and ⌊X⌋ was defined before as the infimum of X.
The gray scaling of Hessian magnitude is implemented by: where I h = t √ | | is the re-mapped Hessian magnitude similar to Formula (10), I max h and I min h are the maximum and minimum of I h , and Q m h is the maximum gray level number.
Thus, another two VTIs, corresponding to the definitions of HAV and THV in the Hessian image domain, are obtained as follows:

Vector-based CM or VCM
Given above vector discretization, we present a new method based on the traditional GLCM called the vector-based CM, or VCM, and is expressed below as: where T i represents the VTIs in 3D space, i ∈ {1, 2, 3, 4} , (R, C, S) is the volume size, V 1 and V 2 are a vector pair in T i , and d is the displacement between two voxels along the direction(α, β) , and d * (α, β) = (d * cosαcosβ, d * sinαcosβ, d * sinβ).
Given each of the four VTIs from Formulas (12), (13), (25), and (26), the Formula (27) generates a VCM along a direction in a similar way as the Haralick method [12] does. The Haralick method generates a GLCM along a direction from the original intensity image and our proposed method generates a VCM along a direction from the higher order VTIs. Because of this similarity, the Haralick method will be implemented as reference for comparison purpose.

Classification of the VTFs
Once the VTFs are computed for each polyp's TVI, we use the R package 'randomForest' to perform classification using random forest (RF) [32,33], as it has been shown to be effective in previous polyp classification experiments [34,35]. Due to the limited data size of only 63 polyps, the RF method (and all machine learning algorithms) is susceptible to bias from the input data, or overfitting, depending on how it is divided into training and testing datasets. To reduce this overfitting effect, we utilize two different cross-validation methods to measure the robustness of the model. The first is a common twofold cross-validation method where we randomly divide the total dataset into equally sized testing and training sets, with the criteria that the ratio of benign and malignant lesions are maintained in each set. For a small dataset such as the one we are using, this method uses the least amount of training data and can therefore still lead to overfitting of the classification model. To alleviate this issue, we generate the random training and testing sets 100 times, and all results reported in Sect. 3 are the listed averages across all 100 randomly generated sets. For the second cross-validation scheme, we utilize a leave-oneout style approach where all except one polyp is used in the testing set, and the "left out" polyp is tested. This is repeated until all polyps have been left out of the training set and the final classification results summarize all iterations of the leave-one-out models. The results of the leave-one-out cross validation method are detailed in the Appendix.
To perform classification, we use the function 'ran-domForest' with the settings "ntreetry = 5000" and "doBest = True. " To perform the forward stepwise feature selection [36], we first do a preliminary classification using all available features and obtain the importance value for each feature by its GINI index [32]. The features are then sorted based on a decreasing GINI index value. The RF classification is then run with the 3 highest importance features, and repeated with iteratively increasing number of features added in importance order [21]. This procedure is repeated for each of the 100 randomized sets of testing and training data. The results are evaluated for each group, and the reported values show to average maximum area under the curve (AUC) of the receiver operating characteristic (ROC) curve. The results for the leave-one-out cross-validation method use this same feature selection procedure, only the distribution of training and testing sets are different.
Lastly, we note our choice of using RF classifier compared to other classifiers such as support vector machine (SVM) [37] or K-nearest neighbors [38]. Through internal experiments, we have generally found that we achieve better classification through RF over the other two methods. RFs have shown good classification in a variety of applications [39], and has been found that they tend to outperform SVM in instances of low resolution, such as for the dataset of polyp images used in this study [40]. Further, RF is probabilistic in its design, and is not under the same linear constraints required by SVM for its hyperplane segmentation. This work demonstrates the feasibility and efficacy of these proposed vector textures through RF classification, and comparisons across multiple classification modalities are subjects of future works.

Results and discussion
In this section, we first describe the dataset and evaluate two parameters for gradient and Hessian magnitude to obtain some preliminary results for further experiments. Then polyp classification is performed using the VTFs extracted from the four VTIs. At the end, we compare our classification results with four existing methods.

Polyp dataset
The data set used for these experiments consisted of 59 patients with a total number of 63 polyps found through CTC screening. These polyps were all at least 30 mm or larger that were scheduled for surgical removal later. When they were removed, the pathology reports were obtained to verify whether the polyps were indeed cancerous as an adenocarcinoma or were benign/premalignant as an adenomatous polyp. The breakdown of the dataset can be seen in Table 1. For classification, the dataset was divided into binary categories of malignant (adenocarcinoma) and pre-malignant (all others).
The regions of interest (ROIs), used to compute the VTFs described below, were manually selected around the polyp region of the CTC image. For each polyp, a 3D volume was extracted, which was confirmed by radiologists to ensure accuracy for the manual procedure. We note that a thresholding preprocessing step was used to discard all voxels below -450 HU within these ROIs as being predominately air from the lumen of the colon. The information encoded in these voxels from partial volume effects is mostly air and therefore contributes more noise to the polyp features for classification. Since the first and second order derivatives apply kernels to the polyp images, the border voxels still use the intensity values provided outside of the ROI from the original CTC image. Examples of a polyp ROI slice are shown in Fig. 5 with each of the different image metrics used, i.e., the images in the original intensity domain, the gradient domain, and the Hessian (eigenvalue) domain. From the images, it is clear that each of the azimuth, polar, and magnitude images from the two derivative orders provide different types of information or different texture patterns. We hypothesize that given the same polyp descriptor or VCM in this study, the extracted VTFs from the gradient domain would be different from the Hessian domain, reflecting the polyp information at different orders. While this dataset is small, we note the medical application to these polyp masses can still be quite significant. These polyp masses are large enough that removal will always be necessary, but the type of treatment may change based on the pathology. For example, accurate classification may help guide physicians to prevent unnecessary biopsy procedures which can help reduce medical costs and potential risks to the patient.

Two parameters of magnitude images
Gradient magnitude and Hessian eigenvalue magnitude are two kinds of geometric measurements which represent the vector length. Since gradient and Hessian images both contain image edge information, they could enhance the contrast of the image to some extent, most notably at  the boundaries. Therefore, their magnitude might vary in an interval with a wide range. However, most of the information is clustered in a very narrow interval as shown in Fig. 3. So, when we employ the t-th root mapping of Formula (10) to alleviate the issue, it is necessary to make sure how to set the value of t to obtain a top performance.
To this goal, t is changed from 1 to 8 by the interval of 1 to calculate the polyp descriptors according to Formulas (5) and (12) while the gray level is kept 32. Table 2 lists the AUC scores of magnitudes of gradient and three Hessian eigenvalues. By consideration of their averages and their stabilities, we find t = 2 and t = 4 should be reasonable choices for gradient and Hessian magnitude images in the following experiments.
The second important parameter of magnitudes is its gray levels which determine the magnitude images and the VCM size. A larger gray level will demonstrate more details of metric images while smaller gray levels will over smooth the VCMs which will affect the discrimination of the texture features. Therefore, it is necessary to test the gray level's influence in the polyp classification as shown in Table 3.

Outcomes of angle-based VCM
Both GAV and HAV are angle-based vector images which could be fed to VCM using Formula (27) to calculate the eHMs after digitalization. GAV represents the gradient angle vector image while HAV reflects the novel angle vector image derived from the Hessian eigenvalue vector. Both contain the two components of the azimuth and polar angles. In the angular vector digitalization, there are four parameters corresponding to the four angles, via Q a g ,Q  Tables 4 and 5. Table 4 illustrates that the GAV is very robust when the gray level varies from 28 to 66. The mean of AUC is about 0.94 while its standard deviation is almost less than 0.03. Comparatively, AUC scores of HAV in Table 5 show that its fluctuation of their averages is a little wider than GAV, which varies from 0.897 to 0.949. Meanwhile, most of the standard deviations are in the range of [0.03, 0.04]. From the best results of GAV and HAV, both could reach a similar classification level as high as 0.949.

Outcomes of vector (angle and magnitude)-based VCM
Dissimilar to GAV and HAV which only contain angles, TGV and THV consist of all information including both gradient and Hessian eigenvalues, i.e., the magnitudes and the azimuth and polar angles. Both vector images   have six parameters in their quantization, i.e., Q m g , Q a g ,Q p g , Q m h ,Q a h and Q p h . Like the angle-based VCM, we set some reasonable quantification levels for these parameters to generate non-sparse VCMs based on the best results of GAV and HAV. We hereby test TGV by changing Q m g in the range {1, 2, 3, 4} while Q a g keeps 10 and Q p g varies in the range {4, 5} as shown in Table 6. Dissimilarly, we test  Table 7 where t-th power is equal to 4. To measure the classification performances, their AUC scores are obtained to show their effectiveness. Table 6 demonstrates that TGV improves the average AUC scores and decreases their standard deviation compared with GAV's classification results. That means all geometric components of gradient could provide some contribution to enhance the discrimination in polyp classification. Additionally, we find the similar trend for THV's results in Table 7 which exceeds more than 1% by their best AUC scores.

Comparison to other methods
For reference on how well our proposed method performs, we compare our results to some similar texture extraction methods and a state-of-the-art deep learning method as follows.
• Extended Haralick features (eHF) -this method extracts a set of 60 texture features from the GLCMs of the intensity image [21]. • HoG3D -this method counts the occurrences of gradient orientation in some cropped portions of the intensity image and generates some histograms which are joined to form gradient features [18]. • CoLIAGe -this model employs gradient angles to extract the entropy of every local patch to form global texture features by two joint histograms [19]. • VGG-16 -it is a widely cited deep learning method.
Total of 20 salient slices were extracted from each polyp volume to feed to the VGG-16 pipeline for polyp classification [41]. Table 8 shows the results of comparison to the similar texture extraction methods listed above, where the same RF classifier was used, and the deep learning method. Our proposed method performed the best. We can see quite clearly that when comparing to the extended Haralick feature method, the T 4 derived texture features outperformed the Haralick texture features with an AUC of 0.962 against 0.876. Comparing against other gradient angular features, our proposed angular texture features improved the performance substantially over the 3D HOG features (AUC = 0.804) and the CoLIAGe angular features (AUC = 0.923). The gain is also substantially higher over the VGG-16 outcome (AUC = 0.833). We    [42,43] would be more comparable on a much larger dataset. Therefore, the eHF, HOG3D, and CoLIAGe models provide a more representative comparison and evaluation of the proposed vector-textures since they may be utilized under the same circumstances of fewer data entries. Using a Wilcoxin ranked t-test, we obtained the quantitative measures for significant difference between the results of our proposed method and those of the reference methods as listed in Table 9. In all except one instance, we find that our results performed significantly better (p < 0.05) than the comparison methods.

Conclusions
To enhance the image contrast of the original CTC polyp images, this study utilizes gradient operator and Hessian operator to generate the corresponding gradient image and Hessian image. The gradient image is represented by a vector field. The Hessian image is represented, according to the definition, by a matrix field. To avoid the difficulty of manipulating matrices, we take the three eigenvalues of the Hessian matrix at each image voxel as a vector and reduce the matrix field as a vector field similar to gradient field, thus all operations in the gradient domain are adapted to the Hessian domain. In addition, a novel concept of VTIs is proposed by the use of the vector geometric measures through the two vector fields in the corresponding gradient and Hessian domains, i.e., the GAV images (T 1 ), the TGV images (T 2 ), the HAV images (T 3 ) and the THV image (T 4 ). Moreover, another novel concept of vector-based CM or VCM is introduced to extract 2D texture patterns from these 2D/3D VTIs. These 2D texture patterns or VCMs can be viewed as the projection of the 2D/3D VTIs at different angles. From the projected VCMs, texture measures can be extracted as VTFs and classified by an existing classifier, such as RF as an example. Experimental outcomes demonstrated that the proposed VTF extraction method can outperform the state-of-the art feature extraction methods for polyp classification.
The novel textures introduced in this work are based on the CM. We had chosen to focus on the CM as previous studies demonstrated GLCM textures had typically outperformed those of related matrix-based textures such as those from gray-level run-length matrices. While we only focused on the co-occurrence-based matrices here, we believe these other matrices can be another avenue to explore these vector-textures and will be considered in our future works. We also note that these vector-textures place an emphasis on characterizing more microscopic properties of the lesions since they are based on the higher order derivative information. For classification  with handcrafted features, it is important to introduce as much relevant information to the classifier while minimizing redundant information. Another topic for our future works will be looking into the integration of these vector-textures with other textures that emphasize more macroscopic lesion properties, such as the gray level size matrix. Appropriate feature selection methods will also be examined on how best to integrate these novel textures into the already existing library of handcrafted features.

Appendix
Some experimental results using leave-one-out scheme The results of using a leave-one-out cross validation method are presented below. For each model training instance in a leave-one-out method, only a single polyp is removed for the testing set at a time. The process is repeated until each polyp has been removed once, and the output classification probability from each polyp can be used to evaluate the overall performance. Since the leave-one-out method uses all but one polyp each instance, there is much more training data available to generate a model. This is especially important when the dataset used is already small with only 63 polyps. As can be seen in the tables below, the repeated experiments with a leave-one-out classifier generally perform better than the two-fold cross validation equivalent since there is more data in the training set each time.
The experiments for the results of Tables 4, 5, 6 and 7 of the two-fold cross validation method were repeated using the leave-one-out validation method and the