Omni-dimensional dynamic convolution feature coordinate attention network for pneumonia classification

Li, Yufei; Xin, Yufei; Li, Xinni; Zhang, Yinrui; Liu, Cheng; Cao, Zhengwen; Du, Shaoyi; Wang, Lin

doi:10.1186/s42492-024-00168-5

Original Article
Open access
Published: 08 July 2024

Omni-dimensional dynamic convolution feature coordinate attention network for pneumonia classification

Yufei Li¹,
Yufei Xin¹,
Xinni Li¹,
Yinrui Zhang¹,
Cheng Liu¹,
Zhengwen Cao¹,
Shaoyi Du^1,2 &
…
Lin Wang ORCID: orcid.org/0000-0003-1026-0060³

Visual Computing for Industry, Biomedicine, and Art volume 7, Article number: 17 (2024) Cite this article

99 Accesses
Metrics details

Abstract

Pneumonia is a serious disease that can be fatal, particularly among children and the elderly. The accuracy of pneumonia diagnosis can be improved by combining artificial-intelligence technology with X-ray imaging. This study proposes X-ODFCANet, which addresses the issues of low accuracy and excessive parameters in existing deep-learning-based pneumonia-classification methods. This network incorporates a feature coordination attention module and an omni-dimensional dynamic convolution (ODConv) module, leveraging the residual module for feature extraction from X-ray images. The feature coordination attention module utilizes two one-dimensional feature encoding processes to aggregate feature information from different spatial directions. Additionally, the ODConv module extracts and fuses feature information in four dimensions: the spatial dimension of the convolution kernel, input and output channel quantities, and convolution kernel quantity. The experimental results demonstrate that the proposed method can effectively improve the accuracy of pneumonia classification, which is 3.77% higher than that of ResNet18. The model parameters are 4.45M, which was reduced by approximately 2.5 times. The code is available at https://github.com/limuni/X-ODFCANET.

Introduction

Pneumonia is an acute respiratory infection caused by viruses, bacteria, or fungi that result in inflammation of the lungs and interstitial lung changes, leading to lung-tissue damage and even death. Pneumonia is the leading cause of death among children under the age of five, kills more children than any other infectious disease, claiming the lives of over 700,000 children under 5 years old annually, or approximately 2,000 children every day [1, 2]. Among individuals over the age of 75 years, pneumonia is also a significant contributor to mortality, with an estimated 100,000 deaths annually [3]. COVID-19 is a novel coronavirus that began to break out in 2019 and spread worldwide, becoming a danger to human health. The virus is transmitted between people through respiratory droplets or close contact with contaminated surfaces [4]. Infected people develop symptoms such as fever, cough, and respiratory distress 2-14 days after exposure to the virus [5]. Therefore, timely detection and treatment are of great importance in patients with pneumonia.

Currently, chest X-ray (CXR) [6], computed tomography (CT) [7], and magnetic resonance imaging (MRI) [8] are commonly used in hospitals for the diagnosis of pneumonia. Doctors generally make a comprehensive diagnosis of COVID-19 by combining antigen testing, nucleic-acid testing, CXR, CT, and MRI. The Zhongnan Hospital of Wuhan University suggested that pneumonia may be diagnosed by detecting clinical symptoms as well as radiological manifestations of pneumonia [9]. In addition, Ai et al. [10] showed that CT has high sensitivity for the diagnosis of COVID-19 and can be used as the primary diagnostic tool. Despite the high sensitivity of CT in the detection of chest-film abnormalities [11], the use of this method still faces some challenges; for example, CT scanners are non-portable and require sterilization of the device and imaging room after patient use. Moreover, the radiation dose is considerably higher than that of X-rays [12], and CT is not recommended for children as they are more sensitive to radiation. Compared with CT, portable CXR equipment is readily available, inexpensive, and available in most primary-care hospitals [13]. Additionally, CXR can be performed in isolated rooms; therefore, their use in hospitals reduces staff exposure to viruses and the risk of infection [14]. Therefore, CXR is considered the main diagnostic method for the clinical diagnosis and management of children with suspected COVID-19 [15, 16].

The current diagnostic approach to pneumonia mainly relies on the subjective experience of clinicians, and the clinical symptoms of COVID-19 are very similar to those of pneumonia caused by viruses or bacteria. Some patients present only mild symptoms; thus, distinguishing COVID-19 from other common types of pneumonia caused by the respiratory tract is difficult using clinical features alone [17]. Accordingly, computer-aided diagnosis has played an important role in medical research, clinical disease diagnosis, and treatment in recent years, for example, in the early detection of arthritis and chest diseases [18, 19].

Convolutional neural networks are particularly adept at classification tasks, but their performance in pneumonia classification is constrained by the fixed nature of the convolution kernels. This hinders their ability to adapt to complex feature extraction, and the lack of a clear channel-importance weighting presents a challenge. To overcome these issues in the classification of pneumonia, this study proposes X-ODFCANet, which has the ability to comprehend spatial information and enables flexible adaptive convolution kernels to classify pneumonia by inputting CXRs. First, ResNet18 is used as the backbone to effectively avoid gradient-explosion and gradient-disappearance problems caused by an increase in the number of middle layers of the network [20]. Second, the feature coordinate attention (FCA) module added to the network enhances the localization of significant features in the image and highlights the significance of spatial information. This enables the model to better comprehend and utilize the spatial structure of an image without introducing excessive parameters. Additionally, it enables the network to handle images effectively with multiscale information and improves the generalization ability of the model. Third, the omni-dimensional dynamic convolution (ODConv) module embedded in our network allows the model to adjust the convolution kernel in different dimensions to capture rich information and better capture long-range dependencies in the input data. Its dynamic nature enables the network to select and adjust the convolution kernel automatically during the learning process, thereby ensuring computational efficiency. Finally, the effectiveness of the method is verified using a public dataset. Experiments show that our proposed method can classify pneumonia with high accuracy and a low number of parameters.

The main contributions of this study are as follows:

A network named X-ODFCANet is proposed for pneumonia classification. The integration of the FCA module enhances feature localization and spatial information, improves spatial-structure comprehension without introducing excessive parameters, and enhances the network generalization.
The integration of the ODConv module facilitates adaptive convolution-kernel adjustments in different dimensions, capturing rich information and long-range dependencies, while maintaining computational efficiency.
Experiments on the public dataset show that our proposed method can classify pneumonia with high accuracy and a low number of parameters.

Attention mechanism

ResNet treats all feature channels equally without differentiating the importance of each channel. This approach can dilute the focus on channels that may carry more relevant information for specific tasks, such as pneumonia detection, where certain features, e.g., textures and shapes of lung opacities, are more informative than others. Attention mechanisms are generally integrated into either the channel or spatial dimensions. Hu et al. [21] proposed a channel attention module, squeeze-and-excitation (SE) module. In Fig. 1, the operation of the SE module is divided into the following two parts. (1) First is the squeeze operation, which compresses the spatial dimension of the input features through a global pooling layer. The channel dimensions are then flattened and reduced by a fully connected (FC) layer. (2) An excitation operation, nonlinear function, and activation function are used to increase the dimensionality to ensure the consistency of the input and output channels and obtain the channel attention feature map. Finally, convolution is performed with the original output, that is, the reweighting of the SE module is completed.

However, because the SE module only considers the adjustment of the channel dimension and not the spatial dimension, the extracted feature information loses its spatial part. This problem is solved by the proposed convolution block attention module (CBAM) [22], which focuses on information not only on the channel but also on the spatial dimension. It performs the attention-weighting operation from both the channel and spatial dimensions to obtain feature information containing both spatial and channel dimensions. The network structure of the CBAM module is schematically illustrated in Fig. 2.

Dynamic convolution

Conventional convolution has the feature of weight sharing; that is, all samples share convolution parameters in the convolution network. Therefore, to increase the capacity of the model, the depth or width of the network must be increased, which leads to an increase in the computation complexity and number of parameters of the model, making model deployment difficult. However, in some practical-application scenarios, a high real-time performance of the model is required, as well as the need for a model with a low number of parameters and computations. To solve these problems, Chen et al. [23] proposed a dynamic convolution mechanism that could increase the expressiveness of a model without increasing the depth or width of the network. Dynamic convolution works by continuously and automatically adjusting the convolution parameters according to the input image. It makes adjustments for different images (e.g., viral pneumonia, COVID-19, and normal) and processes them using more suitable convolution parameters. A comparison of static and dynamic convolution is presented in Fig. 3. In the static convolution, the convolution kernel does not depend on the input function, whereas in the dynamic convolution, the convolution kernel is the input function.

Unlike static convolution, which uses a single kernel, dynamic convolution can be combined into a dynamic kernel based on the generation of multiple dynamically parallel convolution kernels that are extremely data-dependent and can dynamically adjust the weights of the dynamic convolution kernels depending on the input data to improve the generation and expression of the network. The dynamic-convolution process is shown in Fig. 4.

As shown in Fig. 4, in the dynamic-convolution structure diagram, the input data x is first passed through the attention module containing the average pooling, the FC layer, the rectified linear unit (ReLU) layer, the FC layer, and the Softmax function to obtain the dynamic convolution kernel weight parameters with data dependencies. Second, it is multiplied by the corresponding initialized convolution-kernel parameters separately to obtain a dynamic convolution kernel containing data dependencies using a weighting operation. Finally, the normalization and activation-function operations are performed. In summary, the entire process of dynamic convolution introduces only a small amount of extra computation through the attention mechanism and weighting operation of the convolution kernel to improve network performance.

Methods

Overall network design

In the process of feature extraction for chest radiographs, detailed features such as textures are easily ignored because of the insufficient directional feature extraction by deep learning. Therefore, an X-ray omni-dimensional dynamic convolution feature-coordinate attention (CA) network named X-ODFCANet is proposed. By adding the FCA and ODConv modules to the residual network, the ability of the network to extract feature information from chest radiographs is enhanced. A diagram of the specific network structure is presented in Fig. 5.

The model comprises the following five components: an initial convolution layer, a maximum pooling layer, an ODConv residual module, a feature-coordinated attention module, and a FC layer. The specific configuration of inception modules is shown in Table 1.

Table 1 Layer structure of X-ODFCANet

Full size table

The full-dimensional dynamic convolution residual module extracts features from CXRs and enhances feature extraction by learning the attention weights of the convolution kernels in four dimensions: input channel, output channel, convolution kernel space, and number of convolution kernels, which improves the classification accuracy of the network while reducing the number of parameters. The FCA module decomposes the channel attention into two one-dimensional feature-encoding processes using CA. It extracts relevant feature information from both horizontal and vertical directions and aggregates them. This allows the network to obtain interchannel information while preserving the position information related to the direction, thereby improving the ability of the network to extract the features of lesion areas. Again, the obtained feature information is input into the global-average pooling layer to unify the image size. Finally, the extracted features are classified using Softmax.

ODConv ResBolck

A regular convolution layer has a single static convolution kernel applied to all the input samples. A dynamic convolution layer uses a linear combination of $n$ convolution kernels dynamically weighted by an attention mechanism, which makes the convolution operation dependent on the input image. However, existing studies assign dynamic properties to the convolution kernel only through the number of convolution kernels, ignoring information about the three dimensions of the convolution kernel: the spatial size of the convolution kernel, number of input channels, and number of output channels. In particular, the dynamic convolution layer can continuously and automatically adjust the convolution parameters in the spatial and channel dimensions. Compared to static-convolutional methods, the dynamic adaptation of the dynamic convolution layer offers the following advantage in feature extraction: enhanced feature-extraction capability, as it is capable of adjusting the size and number of convolution kernels in accordance with the dimensions of the input feature maps and number of channels. Consequently, for a given input image, the weights of each convolution kernel share the same attention scalar, which limits their ability to capture rich contextual cues. Subsequently, in the convolution layer, replacing a regular convolution with a dynamic convolution increases the number of convolution parameters by a factor of $n$. When dynamic convolution is performed on many convolution layers, the number of model parameters significantly increases.

Subsequently, ODConv [24] solves these problems. The ODConv module uses a novel multidimensional attention mechanism with a parallel strategy that learns the four types of attention in a convolution kernel in a parallel manner, along the four dimensions of the convolution kernel space. The four types of attention learned by ODConv are complementary. The corresponding convolution kernels gradually incorporate them, significantly enhancing the feature-extraction ability of the basic convolution operation of the CNN while also reducing the number of network parameters. The structure of ODConv is shown in Fig. 6.

First, the input $x$ is compressed into a feature vector with ${c}_{in}$ length by a channel-by-channel global-average pooling-layer operation. The FC layer maps the compressed feature vector to a low-dimensional space with a reduction ratio of $\gamma$. Each of the four head branches on the right side of the figure contains a FC layer with output sizes $k\times k$,${c}_{in}\times 1$, and $n\times 1$, and a sigmoid activation function that performs a normalization operation on the compressed feature vectors to generate ${\alpha }_{si}$, ${\alpha }_{ci}$,${\alpha }_{fi}$, and ${\alpha }_{wi}$, respectively, and shares ${\alpha }_{si}$, ${\alpha }_{ci}$, ${\alpha }_{fi}$, and ${\alpha }_{wi}$ to all convolution kernels, respectively. This, in turn, enhances the feature-extraction capability of the network for the input chest slices.

Therefore, in this subsection, ODConv is used to replace static convolution in the residual module in the network. The structure of the improved ODConv ResBolck module is shown in Fig. 7, where the yellow part is the ODConv residual module.

FCA module

The attention mechanism is widely used in neural network models, and many networks have adopted the SE attention-mechanism module. However, the module only extracts information related to the channel, ignoring the effect of location information on the output. The CBAM module introduces spatial information encoding through a large-scale convolution, considering both the channel and spatial dimensions. It then combines the extracted information to obtain attention feature maps containing both channel and spatial information. However, the convolution in the CBAM module can only extract local relations and cannot extract long-distance relations. Therefore, in this study, the CA module is adopted [25], which can encode the horizontal and vertical position information into the channel attention so that the mobile network can pay attention to a large range of position information without requiring excessive computation. The structure of the CA module is illustrated in Fig. 8.

In the CA module, the global pooling operation is decomposed into two one-dimensional feature vectors for the encoding operation, and the input CXR with dimensions C×H×W is pooled into X and Y to generate feature maps with dimensions C×H×1 and C×1×H, respectively. The CA module comprises compression and excitation segments. The compression operation is performed on the input image using the global pooling operation, where the input feature images $x\in {R}^{H\times {\mathbb{W}}\times C}$ are averaged by the channel to generate $z\in {R}^{H\times {\mathbb{W}}\times C}$, etc. For example, in channel C,

$${z}^{c}={F}_{gp}\left({x}^{c}\right)=\frac{1}{H\times W}{\sum }_{i=1}^{H}{\sum }_{j=1}^{W}{x}^{c}(i,j),1\le c\le C$$

(1)

In Eq. (1), ${F}_{gp}(\cdot)$ denotes the global pooling operation, $H$ denotes the height of the input feature $x$,$W$ denotes the width of the input feature $x$, and $C$ denotes the number of channels of the input feature $x$. The channel weight $s$ is then calculated as follows:

$$s=f\left({F}_{sq2}\left(\delta \left({F}_{sq1}((z))\right)\right)\right)$$

(2)

In Eq. (2), $f$ denotes the Sigmoid activation function, $\delta$ denotes the ReLU linear correction unit, and ${F}_{sq1}(\cdot)$ and ${F}_{sq2}(\cdot)$ represent the 1 × 1 convolution layer with channel-reduction parameter $r$. Finally, the inputs and obtained weight $s$ are multiplied according to the channel, where $\left[{s}^{1},{s}^{2},\cdots ,{s}^{C}\right]$, to obtain

$${\widetilde{x}}^{c}={s}^{c}{x}^{c},1\le c\le C$$

(3)

In Eq. (3), the output features $\widetilde{x}=\left[{\widetilde{x}}^{1},{\widetilde{x}}^{2},\cdots ,{\widetilde{x}}^{C}\right]$ of the CA have the same dimensions as the input features $x\in {R}^{H\times {\mathbb{W}}\times C}$.

Additionally, the FCA module based on the CA can aggregate deep and shallow features in the network, which mainly comprise the CA module and global-average pooling layer. First, this module performs fusion processing of the features extracted by the CA module. The image size is then unified to 1×1 using a global-average pooling operation. Finally, the obtained feature-fusion results are input into a deep network for feature aggregation. The structure of the FCA module is illustrated in Fig. 9.

Loss function

The loss function estimates the model by measuring the difference between the predictions of the model $f(x)$ and ground truth $y$. This subsection describes the use of a cross-entropy loss function in the classification method. To ensure that the input to cross-entropy is a probability distribution, a Softmax function is added before the cross-loss function. This process transforms the network output into a probability distribution within the range [0, 1]. The equation is as follows:

$$Loss=-{\sum }_{i=1}^{C}{y}_{i}{\mathit{log}P}_{i}$$

(4)

where ${P}_{i}$ denotes the output probability and ${y}_{i}$ denotes the true category. Equation (4) shows that the value of $Loss$ is always positive and $Loss$ tends to 0 as ${P}_{i}$ tends to ${y}_{i}$.

Results

Datasets

Due to the specificity of medical datasets, problems such as sample imbalance, low data quality, and dataset scarcity can occur. The datasets used in this study were derived from two publicly available pneumonia datasets. The first was the COVID-19 Radiography Database developed by Chowdhury et al. [26]. It contains images of COVID-19 positive, normal, and common pneumonia [26, 27]. The second was CXR (Covid-19 & Pneumonia), a Kaggle database collected from various publicly available resources [28,29,30] containing three categories of chest radiographs: common pneumonia, COVID-19, and normal.

By combining the above two datasets, the dataset used in this study contained 12,880 chest radiographs from three categories of images: COVID-19, normal pneumonia, and normal. The same preprocessing operations were performed on the dataset to ensure uniformity of the data formats. The division ratio of the experimental datasets used in this study was 7:2:1.

To ensure a fair comparison, the same training strategy is adopted and the network is implemented using Pytorch1.10.1. The CPU processor used in this experiment was an Intel I9-11900K processor with 64GB of RAM, and the GPU was an Nvidia RTX A4000 (16G).

Experimental results and analysis

To verify the effectiveness of the proposed method, eight networks, DarkCovidNet [31], ConvNext [32], ShuffleNet [33], MobileNetv2 [34], ResNet18, ResNet50, ResNet101, and EfficientNet [35], are selected as the comparison networks. By comparing the classification effects of different networks, the effectiveness of the improved method is demonstrated. In this subsection, the experimental results are analyzed in two parts: training and testing sets.

(1)
Analysis of validation results of the training set

After 35 iterations with a fixed learning rate and cross-quadratic fold validation, the accuracy-change curves of each classification method for the training and validation sets are obtained, as shown in Fig. 10. The figure shows that the accuracy of each model gradually tends to be smoothed with the number of iterations. The accuracy rate of this method is higher than that of the other eight networks in both the training and validation sets. The experimental results show that the addition of the FCA and ODConv modules can improve the classification accuracy of the model.

In this subsection, X-ODFCANet is trained, and after obtaining the experimental results, the average accuracy and number of parameters after four-fold cross-validation on the training set are compared with those of the eight models, as shown in Table 2. From Table 2, after adding the FCA and ODConv modules, the accuracy of the model is improved by 3.77%, and the number of parameters is decreased to 4.45M, which proves the effectiveness of the improved method in this study.

Table 2 Comparison of the average accuracy of each model in the training set

Full size table

(2)
Analysis of experimental results for testing set after the training was completed. The optimal weights of the improved method were obtained. The model was predicted on the testing set to obtain the precision, recall, and specificity indices of X-ODFCANet for the prediction of each category, and the testing results were compared with those of the eight models. The final metrics of the experimental results of the nine models for the three categories of chest radiograph classification, namely, COVID-19, common pneumonia, and normal, are shown in Tables 3, 4, and 5, respectively. These three tables show that the proposed method outperforms the other methods in all three categories, except for the general category. This proves that the FCA and ODConv modules can improve classification accuracy and reduce the number of parameters during training.

Table 3 Classification results of COVID-19 for each model

Full size table

Table 4 Classification results of common pneumonia for each model

Full size table

Table 5 Classification results of each model for the normal category

Full size table

To compare the classification of the improved method proposed in this study for pneumonia more intuitively, the confusion matrices of the nine models are plotted to compare the effects of the FCA module and ODConv on the model-performance improvement. The observation of Fig. 11g shows that the average accuracy of X-ODFCANet for the prediction of COVID-19, normal, and pneumonia is 97.57%, which is an increase of 3.77%, proving the effectiveness of the proposed method.

Discussion

Qualitative analysis

The experimental results, presented in Tables 3, 4, and 5, indicate that the network exhibits misclassification errors on a subset of chest radiographs displaying inconspicuous symptoms. Furthermore, the classification accuracy is not balanced across different categories. This may be because the chest radiographs exhibiting inconspicuous symptoms show similarities to chest radiographs belonging to other categories in terms of image features; thus, classifying the data accurately using the network is challenging.

Furthermore, the issue of network inconsistencies across categories should be addressed in future studies. In practice, the number of samples in different categories may vary considerably, which may result in the suboptimal performance of the model in a few categories. Therefore, new loss functions and training strategies must be designed to address these issues. Although the method proposed in this study has better accuracy and fewer model parameters, it is more computationally intensive. In future studies, the feature-extraction method will be further improved to solve this problem.

As illustrated in Fig. 11, the efficacy of our method in classifying the data was demonstrated to be excellent across all three categories. However, this method encounters difficulties in distinguishing between normal and ordinary lungs. This is because of the lack of a clear distinction between the features of ordinary and normal lungs. In the future, the construction of multitasking learning strategies or the utilization of more effective data preprocessing and enhancement techniques will help resolve this shortcoming. The number of samples may vary significantly among categories, resulting in a poorly performing model for a few categories. Therefore, novel loss functions and training strategies must be developed to address these issues. In future work, further optimization of the network structure will be beneficial for improving the performance and reducing the number of network parameters.

Limitations

In practice, the number of samples may vary considerably among different categories; thus, the model may perform inadequately in a few categories. Therefore, novel loss functions and training strategies must be developed to address these issues. In the future, the feature-extraction method will be further improved to solve the problem of high model computation. Owing to the difficulty in manually labelling the current chest-film samples and the small amount of labelled data, weakly supervised or unsupervised learning can be used in the future to train the chest-film dataset containing only image labels.

The utilization of unsupervised learning methodologies such as self-supervised learning or generative adversarial networks [36] may be considered to train models utilizing unlabeled data. These methods may assist in optimizing the utilization of unlabeled data, thereby enhancing model performance. In addition, the potential of transfer learning or meta-learning may be investigated to enhance the generalization capacity of the model. These methods facilitate the use of existing knowledge across diverse tasks and domains, thereby enhancing the efficacy of the model. Although the proposed method demonstrates satisfactory performance in certain respects, numerous challenges require further attention. In future endeavors, the model performance will be further enhanced.

The X-ODFCANet method can be applied to other medical-imaging tasks such as the classification of other types of image data and other chest diseases. Because X-ODFCANet has been demonstrated to be well adapted to lung structures, it can be laterally generalized following parameter tuning for similar lung diseases, such as lung cancer. These applications serve to validate the generality and scalability of the proposed method. Furthermore, the feature-extraction method of X-ODFCANet can be enhanced by designing new feature-extraction networks to enhance its feature-extraction capability. In the future, the possibility of developing new model structures or adopting more efficient model structures will be investigated, such as lightweight or deeply separable structures. These enhancements are anticipated to further optimize the performance of our method for medical-image-processing tasks.

However, this study did not provide a detailed classification of pneumonia, which is necessary for precise treatment protocols. The proposed method did not differentiate between bacterial and viral pneumonia, even though different types of pneumonia require different treatments. In future studies, a more fine-grained classification of pneumonia should be considered to enable more accurate treatment protocols.

Conclusions

To address the issues of low accuracy and a high number of model parameters in pneumonia-classification recognition, X-ODFCANet, an ODConv feature-coordinated attention-classification network for CXRs was proposed. ODConv replaces the original static convolution module and applies four dimensions to the input CXRs for feature extraction: the spatial dimensions of the convolution kernel, number of input channels, number of output channels, and number of convolution kernels. The feature information extracted from the four dimensions was fused, and the extracted features were classified. Compared with static convolution, ODConv improves feature-extraction accuracy and reduces information redundancy, resulting in a more lightweight model with fewer parameters. In addition, the FCA module fuses the feature information obtained from the horizontal and vertical spatial-direction aggregations. The image size is then unified using a global-average pooling operation. The feature-fusion results are ultimately inputted into a deep-learning network for classification. The FCA module improves the ability of the model to extract feature information, resulting in more accurate pneumonia classification.

Experiments were conducted on the dataset used in this study using an improved classification network. The results showed that X-ODFCANet effectively improved the accuracy of the pneumonia-classification method and reduced the number of model parameters. The average accuracy of the three chest radiographs also increased. The average accuracy of the pneumonia-classification method for COVID-19, common pneumonia, and normal cases was 97.57%. This was 3.77% higher than the accuracy of ResNet18. These results demonstrate that X-ODFCANet can effectively enhance the accuracy of pneumonia classification while reducing the number of model parameters.

Availability of data and materials

The public datasets used in this study are publicly available COVID-19 Radiography Database provided online as well as Chest X-ray (Covid-19 & Pneumonia).

Abbreviations

COVID-19:: Corona virus disease 2019
CXR:: Chest X-ray
CT:: Computed tomography
MRI:: Magnetic resonance imaging
SE:: Squeeze-and-excitation
FC:: Fully connected
FCA:: Feature coordinate attention
ODConv:: Omni-dimensional dynamic convolution
CA:: Coordinate attention
CBAM:: Convolution block attention module
ReLU:: Rectified linear unit

References

GBD 2019 Diseases and Injuries Collaborators (2020) Global burden of 369 diseases and injuries in 204 countries and territories, 1990-2019: a systematic analysis for the global burden of disease study 2019. Lancet 396(10258):1204–1222
Google Scholar
Fadel SA, Boschi-Pinto C, Yu SC, Reynales-Shigematsu LM, Menon GR, Newcombe L et al (2019) Trends in cause-specific mortality among children aged 5-14 years from 2005 to 2016 in India, China, Brazil, and Mexico: an analysis of nationally representative mortality studies. Lancet 393(10176):1119–1127. https://doi.org/10.1016/S0140-6736(19)30220-X
Baek MS, Park S, Choi JH, Kim CH, Hyun IG (2020) Mortality and prognostic prediction in very elderly patients with severe pneumonia. J Intensive Care Med 35(12):1405–1410. https://doi.org/10.1177/0885066619826045
Hassan H, Ren ZY, Zhao HS, Huang SJ, Li D, Xiang SH et al (2022) Review and classification of AI-enabled COVID-19 CT imaging models based on computer vision tasks. Comput Biol Med 141:105123. https://doi.org/10.1016/j.compbiomed.2021.105123
Article Google Scholar
Tian SJ, Hu N, Lou J, Chen K, Kang XQ, Xiang ZJ et al (2020) Characteristics of COVID-19 infection in Beijing. J Infect 80(4):401–406. https://doi.org/10.1016/j.jinf.2020.02.018
Article Google Scholar
Najaran MHT (2023) A genetic programming-based convolutional deep learning algorithm for identifying COVID-19 cases via X-ray images. Artif Intell Med 142:102571. https://doi.org/10.1016/j.artmed.2023.102571
Article Google Scholar
Celik G (2023) Detection of covid-19 and other pneumonia cases from CT and X-ray chest images using deep learning based on feature reuse residual block and depthwise dilated convolutions neural network. Appl Soft Comput 133:109906. https://doi.org/10.1016/j.asoc.2022.109906
Article Google Scholar
Yucel S, Aycicek T, Bilgici MC, Dincer OS, Tomak L (2021) 3 tesla MRI in diagnosis and follow up of children with pneumonia. Clin Imaging 79:213–218. https://doi.org/10.1016/j.clinimag.2021.05.027
Article Google Scholar
Jin YH, Cai L, Cheng ZS, Cheng H, Deng T, Fan YP et al (2020) A rapid advice guideline for the diagnosis and treatment of 2019 novel coronavirus (2019-nCoV) infected pneumonia (standard version). Mil Med Res 7(1):4
Google Scholar
Ai T, Yang ZL, Hou HY, Zhan CN, Chen C, Lv WZ et al (2020) Correlation of chest CT and RT-PCR testing for coronavirus disease 2019 (COVID-19) in China: a report of 1014 cases. Radiology 296(2):E32–E40. https://doi.org/10.1148/radiol.2020200642
Article Google Scholar
Kovács A, Palásti P, Veréb D, Bozsik B, Palkó A, Kincses ZT (2021) The sensitivity and specificity of chest CT in the diagnosis of COVID-19. Eur Radiol 31(5):2819–2824. https://doi.org/10.1007/s00330-020-07347-x
Article Google Scholar
Li L, Qin LX, Xu ZG, Yin YB, Wang X, Kong B et al (2020) Using artificial intelligence to detect COVID-19 and community-acquired pneumonia based on pulmonary CT: evaluation of the diagnostic accuracy. Radiology 296(2):E65–E71. https://doi.org/10.1148/radiol.2020200905
Article Google Scholar
Aggarwal P, Mishra NK, Fatimah B, Singh P, Gupta A, Joshi SD (2022) COVID-19 image classification using deep learning: Advances, challenges and opportunities. Comput Biol Med 144:105350. https://doi.org/10.1016/j.compbiomed.2022.105350
Article Google Scholar
Hou J, Gao T (2021) Explainable DCNN based chest X-ray image analysis and classification for COVID-19 pneumonia detection. Sci Rep 11(1):16071. https://doi.org/10.1038/s41598-021-95680-6
Article Google Scholar
Serrano CO, Alonso E, Andrés M, Buitrago N, Vigara AP, Pajares MP et al (2020) Pediatric chest X-ray in COVID-19 infection. Eur J Radiol 131:109236. https://doi.org/10.1016/j.ejrad.2020.109236
Article Google Scholar
Rubin GD, Ryerson CJ, Haramati LB, Sverzellati N, Kanne JP, Raoof S et al (2020) The role of chest imaging in patient management during the COVID-19 pandemic: a multinational consensus statement from the fleischner society. Radiology 296(1):172–180. https://doi.org/10.1148/radiol.2020201365
Article Google Scholar
Tian YJ, Fu SJ (2020) A descriptive framework for the field of deep learning applications in medical images. Knowl-Based Syst 210:106445. https://doi.org/10.1016/j.knosys.2020.106445
Article Google Scholar
Castro-Zunti R, Park EH, Choi Y, Jin GY, Ko SB (2020) Early detection of ankylosing spondylitis using texture features and statistical machine learning, and deep learning, with some patient age analysis. Comput Med Imaging Graph 82:101718. https://doi.org/10.1016/j.compmedimag.2020.101718
Article Google Scholar
Kim RY, Oke JL, Pickup LC, Munden RF, Dotson TL, Bellinger CR et al (2022) Artificial intelligence tool for assessment of indeterminate pulmonary nodules detected with CT. Radiology 304(3):683–691. https://doi.org/10.1148/radiol.212182
Article Google Scholar
He KM, Zhang XY, Ren SQ, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the 2016 IEEE conference on computer vision and pattern recognition, IEEE, Las Vegas, 27-30 June 2016. https://doi.org/10.1109/CVPR.2016.90
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the 2018 IEEE/CVF conference on computer vision and pattern recognition, IEEE, Salt Lake City, 18-23 June 2018. https://doi.org/10.1109/CVPR.2018.00745
Woo S, Park J, Lee JY, Kweon IS (2018) CBAM: Convolutional block attention module. In: Proceedings of the 15th European Conference on Computer Vision, Springer, Munich, 8-14 September 2018. https://doi.org/10.1007/978-3-030-01234-2_1
Chen YP, Dai XY, Liu MC, Chen DD, Yuan L, Liu ZC (2020) Dynamic convolution: Attention over convolution kernels. In: Proceedings of the 2020 IEEE/CVF conference on computer vision and pattern recognition, IEEE, Seattle, 13-19 June 2020. https://doi.org/10.1109/CVPR42600.2020.01104
Li C, Zhou AJ, Yao AB (2022) Omni-dimensional dynamic convolution. arXiv preprint arXiv: 2209.07947
Hou QB, Zhou DQ, Feng JS (2021) Coordinate attention for efficient mobile network design. In: Proceedings of the 2021 IEEE/CVF conference on computer vision and pattern recognition, IEEE, Nashville, 20-25 June 2021. https://doi.org/10.1109/CVPR46437.2021.01350
Chowdhury MEH, Rahman T, Khandakar A, Mazhar R, Kadir MA, Mahbub ZB et al (2020) Can AI help in screening viral and COVID-19 pneumonia? IEEE Access 8:132665–132676. https://doi.org/10.1109/ACCESS.2020.3010287
Article Google Scholar
Rahman T, Khandakar A, Qiblawey Y, Tahir A, Kiranyaz S, Abul Kashem SB et al (2021) Exploring the effect of image enhancement techniques on COVID-19 detection using chest X-ray images. Comput Biol Med 132:104319. https://doi.org/10.1016/j.compbiomed.2021.104319
Article Google Scholar
Cohen JP, Morrison P, Dao L, Roth K, Duong TQ, Ghassemi M (2020) COVID-19 image data collection: Prospective predictions are the future. arXiv preprint arXiv 2006.11988. https://doi.org/10.59275/j.melba.2020-48g7
Kermany D, Zhang K, Goldbaum M (2018) Labeled Optical Coherence Tomography (OCT) and Chest X-Ray Images for Classification. Mendeley Data. https://doi.org/10.17632/rscbjbr9sj.2
Wang LD, Lin ZQ, Wong A (2020) COVID-Net: a tailored deep convolutional neural network design for detection of COVID-19 cases from chest X-ray images. Sci Rep 10(1):19549. https://doi.org/10.1038/s41598-020-76550-z
Article Google Scholar
Ozturk T, Talo M, Yildirim EA, Baloglu UB, Yildirim O, Acharya UR (2020) Automated detection of COVID-19 cases using deep neural networks with X-ray images. Comput Biol Med 121:103792. https://doi.org/10.1016/j.compbiomed.2020.103792
Article Google Scholar
Liu Z, Mao HZ, Wu CY, Feichtenhofer C, Darrell T, Xie SN (2022) A ConvNet for the 2020s. In: Proceedings of the 2022 IEEE/CVF conference on computer vision and pattern recognition, IEEE, New Orleans, 18-24 June 2022. https://doi.org/10.1109/CVPR52688.2022.01167
Zhang XY, Zhou XY, Lin MX, Sun J (2018) ShuffleNet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the 2018 IEEE/CVF conference on computer vision and pattern recognition, IEEE, Salt Lake City, 18-23 June 2018. https://doi.org/10.1109/CVPR.2018.00716
Sandler M, Howard A, Zhu ML, Zhmoginov A, Chen LC (2018) MobileNetV2: Inverted residuals and linear bottlenecks. In: Proceedings of the 2018 IEEE/CVF conference on computer vision and pattern recognition, IEEE, Salt Lake City, 18-23 June 2018. https://doi.org/10.1109/CVPR.2018.00474
Tan MX, Le QV (2019) EfficientNet: Rethinking model scaling for convolutional neural networks. In: Proceedings of the 36th international conference on machine learning, IMLS, Long Beach, California, 9-15 June 2019
Jeong JJ, Tariq A, Adejumo T, Trivedi H, Gichoya JW, Banerjee I (2022) Systematic review of generative adversarial networks (GANs) for medical image classification and segmentation. J Digit Imaging 35(2):137–152. https://doi.org/10.1007/s10278-021-00556-w
Article Google Scholar

Download references

Acknowledgments

We would like to thank the authors of the COVID-19 Radiography Database and the Chest X-ray (Covid-19 & Pneumonia) for providing valuable data used in this study.

Funding

This work was supported in part by the Key Research and Development Program of Shaanxi Province of China, No. 2024GX-YBXM-149; and in part by the National Natural Science Foundation of China, No. 62071381.

Author information

Authors and Affiliations

School of Information Science and Technology, Northwest University, Xi’an, 710127, Shaanxi Province, China
Yufei Li, Yufei Xin, Xinni Li, Yinrui Zhang, Cheng Liu, Zhengwen Cao & Shaoyi Du
Department of Ultrasound, the Second Affiliated Hospital of Xi’an Jiaotong University, Xi’an, Shaanxi Province, 710004, China
Shaoyi Du
National Key Laboratory of Human-Machine Hybrid Augmented Intelligence, National Engineering Research Center for Visual Information and Applications, and Institute of Artificial Intelligence and Robotics, Xi’an Jiaotong University, Xi’an, Shaanxi Province, 710049, China
Lin Wang

Authors

Yufei Li
View author publications
You can also search for this author in PubMed Google Scholar
Yufei Xin
View author publications
You can also search for this author in PubMed Google Scholar
Xinni Li
View author publications
You can also search for this author in PubMed Google Scholar
Yinrui Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Cheng Liu
View author publications
You can also search for this author in PubMed Google Scholar
Zhengwen Cao
View author publications
You can also search for this author in PubMed Google Scholar
Shaoyi Du
View author publications
You can also search for this author in PubMed Google Scholar
Lin Wang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

YL contributed to conceptualization, methodology, and writing-original draft; YX contributed to data curation, and experimental design; XL contributed to methodology, experimental design and coding; YZ contributed to writing-review, and validation; CL contributed to investigation, and visualization; ZC contributed to validation, and writing-review; SD contributed to methodology, writing-review and supervision; LW contributed to methodology, writing-review, visualization, and supervision. All the authors have read and approved the final manuscript.

Corresponding authors

Correspondence to Shaoyi Du or Lin Wang.

Ethics declarations

Competing interests

The authors declare no conflicts of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Li, Y., Xin, Y., Li, X. et al. Omni-dimensional dynamic convolution feature coordinate attention network for pneumonia classification. Vis. Comput. Ind. Biomed. Art 7, 17 (2024). https://doi.org/10.1186/s42492-024-00168-5

Download citation

Received: 08 February 2024
Accepted: 22 June 2024
Published: 08 July 2024
DOI: https://doi.org/10.1186/s42492-024-00168-5

Omni-dimensional dynamic convolution feature coordinate attention network for pneumonia classification

Abstract

Introduction

Attention mechanism

Dynamic convolution

Methods

Overall network design

ODConv ResBolck

FCA module

Loss function

Results

Datasets

Experimental results and analysis

Discussion

Qualitative analysis

Limitations

Conclusions

Availability of data and materials

Abbreviations

References

Acknowledgments

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords