Keywords

1 Introduction

Pneumothorax is a lung abnormality with air leaking into the space between the lung and chest wall. It can be caused by chest injury or trauma, certain medical procedures, damage from underlying lung disease, or sometimes no obvious reason [12]. The most common imaging tool for the diagnosis of pneumothorax is chest X-ray (CXR). Large pneumothorax can be fatal as air compression may cause significant impairment to circulation and respiration. Accordingly, large pneumothorax shall be classified as a critical abnormality that requires immediate treatment, as shown in the latest computerized CXR triage study [1].

Since the diagnosis and treatment of critical pneumothorax are directly related to the size of pneumothorax area, segmenting pneumothorax accurately rather than image-level classification, is needed for the triage of CXR. However, as shown in a previous study [5], the performance of pneumothorax diagnosis in CXR is highly dependent on the physicians experience, implying that the pixel-level annotations with good quality is difficult to obtain and can be very limited. On the other hand, large image-level annotations can be relatively easy to access with the text analysis techniques on radiological reports [9]. Motivated by this, we propose a weakly supervised learning approach that aims to ease the requirement of annotating all training data in pixel level. Specifically, we allow parts of training data to be weakly annotated with only image-level labels, i.e., pneumothorax or not.

In the literature, to boost the localization of abnormalities, Li et al. [7] developed an end-to-end deep multi-instance network to identify image-level abnormalities with annotated bounding boxes. Yan et al. [10] introduced a weakly-supervised deep learning framework for abnormality identification and localization. Cai et al. [2] further proposed an attention mining strategy to improve identification performance. However, these studies focused on providing rough heatmaps to indicate abnormal regions. However, the precise pixel-level segmentation results, can not be acquired through these methods.

Fig. 1.
figure 1

Training process of our weakly supervised segmentation method. In the testing stage, only the segmentation model is used.

To get better synergy between the well- and weakly-annotated data, our approach employs the spatial label smoothing regularization (SLSR) technique to leverage the network-generated attention masks from the weakly-annotated data. Specifically, we realize the proposed method in two stages, see Fig. 1. First, an image-level classification (pneumothorax or not) model is implemented to obtain the attention masks. The attention masks from this classification model may roughly suggest the pneumothorax regions from the weakly-annotated data. Second, an image segmentation model is designed to use both well- and weakly-annotated data, where the attention masks for weakly-annotated data are incorporated. Since the attention masks do not delineate the exact pneumothorax regions and may have some errors, we employ the SLSR technique to consider the uncertainty for the incorrectness of the attention masks during the training process of the segmentation model. The label smoothing regularization technique addresses label corruption issue by treating labels as probability variables with slightly numerical perturbation, and is proved to be robust to noisy labels [13]. Our experimental results show that our method can improve the performance of the segmentation model with both well- and weakly-annotated data. The contribution of this paper can be two-fold. First, a novel weakly supervised framework is proposed to equip the deep learning segmentation with the capability of learning with well- and weakly-annotated data. Second, we demonstrate the effectiveness of the weakly supervised framework on the pneumothorax segmentation problem in CXR images. As shown in Fig. 2, the pneumothorax segmentation is difficult as several issues such as inter-subject variations, the variety of pneumothorax degree, location and shape, need to be addressed. The pneumothorax segmentation can help identify critical cases and expedite the treatment process.

Fig. 2.
figure 2

Several pneumothorax cases. Blue masks indicate the pneumothorax regions labeled by an experienced radiologist. Inter-subject variations, as well as the variety of pneuothorax degree, location, and shape, can be observed. The last case is hydropneumothorax with pleural effusion and pneumothorax. (Color figure online)

2 Methodology

As shown in Fig. 1, the proposed weakly supervised approach consists of two steps: (1) image-level classification that generates attention masks at the same time; (2) pixel-level segmentation with SLSR that leverages attention masks. The details will be elaborated as follows.

2.1 Image-Level Classification

Image-level classification for our task (pneumothorax or not) is carried out with Resnet101 model [11]. The attention mining method, Guided Attention Inference Network (GAIN) [6], is performed to generate the attention masks. The sizes of the obtained attention masks are 1/8 of the input sizes, and they are further resized via bilinear interpolation to fit the requirement for the pixel-level segmentation model.

2.2 Pixel-Level Segmentation

The SLSR technique is developed to leverage the attention masks of the weakly-annotated data for better performance and mitigate the problem of less accessibility of well-annotated data. Since the attention masks simply capture rough pneumothorax regions and may have some errors, we explore the uncertainty of attention masks by numerically perturbing one-hot label distribution into the probabilistic domain as shown in Fig. 3. Specifically, let \(k \in \{0,1\}\) be the object classes, where class 0 stands for background and 1 for pneumothorax class.

Fig. 3.
figure 3

Perturbation on one-hot label distribution. into the probabilistic domain. For the left real mask, the label is either 1 or 0, whereas labels in the right attention mask are perturbed with \(\varepsilon \).

Assuming that label prior distribution is uniform, the ground-truth label distribution with the consideration of potential label corruption in each pixel \(PIX_{i,j}\) (i and j refer to the row and column indices, respectively) is defined as:

$$\begin{aligned} \small {q_{i,j}}(k) = {{\left\{ \begin{array}{ll} \frac{\varepsilon }{2}, &{} k \ne {y_{i,j}}\\ 1\mathrm{{ - }}\varepsilon \mathrm{{ + }}\frac{\varepsilon }{2}, &{} k = {y_{i,j}} \end{array}\right. } = } {{\left\{ \begin{array}{ll} \frac{\varepsilon }{2}, &{} k \ne {y_{i,j}}\\ 1\mathrm{{ - }}\frac{\varepsilon }{2}, &{} k = {y_{i,j}}, \end{array}\right. }} \end{aligned}$$
(1)

where \(y_{i,j}\) is the ground-truth class of the pixel \(PIX_{i,j}\), and \(\varepsilon \) is the perturbing parameter. The uncertainty for an attention mask can be categorized into two types: all-class and ground-truth uncertainty. The all-class uncertainty suggests that each pixel in an attention mask has equal potential to be either pneumothorax or background with the probability of \(\varepsilon /2\). The ground-truth uncertainty specifies that the pneumothorax region in an attention mask may have errors and the corresponding probabilities of the pixels are perturbed with \(\varepsilon \).

Assuming \({p_{i,j}}(k)\) is the predicted probability of class k from the network, it comes from the softmax function after the final convolutional layer in the segmentation task. With the Eq. 1, the cross-entropy loss of the pixel \(PIX_{i,j}\) can be further computed as:

$$\begin{aligned} \small {l_{i,j}} = - \sum \limits _{k = 0}^1 {\log ({p_{i,j}}(k)){q_{i,j}}(k)}=- (1 - \varepsilon )\log ({p_{i,j}}(y_{i,j})) - \frac{\varepsilon }{{2}}\sum \limits _{k = 0}^1 {\log ({p_{i,j}}(k))}. \end{aligned}$$
(2)

Accordingly, the loss function for a weakly-annotated image with the attention mask is the summation of \(l_{i,j}\) in all the pixels (H and W are height and width of the input image), which is defined as:

$$\begin{aligned} \begin{aligned} \small l = \sum \limits _{i = 1}^H {\sum \limits _{j = 1}^W {l_{i,j}} } = - (1 - \varepsilon )\sum \limits _{i = 1}^H {\sum \limits _{j = 1}^W {\log ({p_{i,j}}(y_{i,j}))} } - \frac{\varepsilon }{{2}}\sum \limits _{k = 0}^1 {\sum \limits _{i = 1}^H {\sum \limits _{j = 1}^W {\log ({p_{i,j}}(k))} } } . \end{aligned} \end{aligned}$$
(3)

Since the training data are comprised of well- and weakly-annotated data, we further introduce a variable of indicator, z, to specify if the data is well or weakly annotated. Meanwhile, referring to Eq. 3, the first term is only effective in the foreground class whereas the second term considers both foreground and background classes. Since in most cases the number of background pixels is significantly larger than the number of the foreground, we implement a weighting factor to soothe the pixel sample imbalance issue. Specifically, the loss function is further defined as:

$$\begin{aligned} \begin{aligned} {l_{SLSR}} =&- (1 - z \cdot \varepsilon )\sum \limits _{i = 1}^H {\sum \limits _{j = 1}^W {\log ({p_{i,j}}(y_{i,j}))}} \\&- z\sum \limits _{k = 0}^1 {\left\{ {(1 + \sum \limits _{i = 1}^H {\sum \limits _{j = 1}^W {I(k = {y_{i,j}})} } ) \cdot \frac{\varepsilon }{{2HW}} \cdot \sum \limits _{i = 1}^H {\sum \limits _{j = 1}^W {\log ({p_{i,j}}(k))} } } \right\} }, \end{aligned} \end{aligned}$$
(4)

where \({I(k={y_{i,j}})}\) is an indicator function to calculate the ground-truth pixel number for k-th class. For a well-annotated image sample, we set \(z = 0\), while \(z=1\) for weakly-annotated image sample. With the implementation of Eq. 4, the pixel-level segmentation model can be trained with well- and weakly-annotated data, and still attain satisfactory performance.

3 Experiments

3.1 Dataset

Totally, 5400 frontal-view chest X-ray images are collected from the Mianyang Central Hospital, Mianyang, Sichuan, China, with IRB approval. Specifically, 3400 images were diagnosed with pneumothorax, whereas the 2000 images are normal cases. 800 pneumothorax images are well-annotated with pixel-level ground-truth masks by an experienced radiologist and the remaining 2600 pneumothorax images only have the image-level labels. The 800 well-annotated data and 2000 normal cases are randomly and evenly split into 2 groups for training and testing. Therefore, the total number of testing data is 1400, whereas the number of available data for training is 4000.

3.2 Experimental Settings

For the segmentation network, three state-of-the-art (SOTA) networks: U-Net [8], LinkNet [3], and Tiramisu [4] are implemented. For Tiramisu, the specific FCDenseNet67 is employed. For all three models, Adam optimization method is employed and the value of momentum is set as 0.9. The learning rate is initiated with 0.0001 and will be gradually degraded by a factor of 0.1 every 100 epochs. The size of network inputs is \(256 \times 256\). For U-Net and LinkNet, we set the batch size as 32, whereas the batch size of Tiramisu model is set as 5. The perturbing parameter \(\varepsilon \) is set to 0.1.

Table 1. Performance of different the SOTA models, and Tiramisu achieves the best performance.
Table 2. Performance w.r.t. different pneumothorax degrees.

3.3 Experimental Results

Results of SOTA Networks. In this experiment, we aim to illustrate the performance limitation of the SOTA networks on the pneumothorax segmentation problem. Specifically, the training data for segmentation, i.e. 400 well-annotated and 1000 normal CXR images, is employed to train the three SOTA networks. The results of the SOTA networks on the 1400 testing data with the IoU (intersection over union) metrics are shown in Table 1. As can be found, the Tiramisu model achieves the best performance with 0.640 of IoU value. Accordingly, the pneumothorax segmentation problem is quite difficult. To further investigate the factor of pneumothorax severity on the segmentation efficacy, the 400 pneumothorax cases are further divided into three groups of small, medium and large, based on the collapse ratios that meet the criteria of \(<= 0.10\), 0.10–0.30, and \(>= 0.30\), respectively. The collapse ratio of the pneumothorax region against the lung field is one of the common quantitative metrics for the measure of pneumothorax severity. The small, medium, and large groups comprise of 156, 142, and 102 cases, respectively. The performance of the Tiramisu model on the three groups is shown in Table 2. Since the IoU is very sensitive to object size, the IoU value for the small group is not very high. Small pneumothorax is usually less critical as large pneumothorax, and sometimes may heal on its own.

Table 3. The results with different combinations of well- and weakly-annotated data. The column “Test” the IoU performance of the entire testing data, whereas the columns “Small”, “Medium” and “Large” are the IoU performances of the dataset with three groups of severity, respectively.

Efficacy of the SLSR. In this experiment, the efficacy of the SLSR on the weakly supervised framework is illustrated. Referring to Table 1, we here only consider the Tiramisu model. The image-level classification model (GAIN) is firstly trained with the 2600 weakly-annotated pneumothorax and 1000 normal images to obtain the attention masks. We use this model to generate attention masks for all test images, and take these results as segmentation prediction and get the IoU value (0.128), which serves the baseline of only using image-level annotations. Afterward, the segmentation models are trained with different experimental settings of ratios between well- and weakly-annotated data. Specifically, we set 4 groups of experiments that include 100, 200, 300, and 400 well-annotated pneumothorax cases. For each group, we firstly train a Tiramisu model using the selected well-annotated pneumothorax cases and 1000 normal images. Then, we also add the selected well-annotated cases to retrain the GAIN model, which can help to improve the quality of generated attention masks. Finally, we consider several experiments with adding 0, 200, 400, and 800 weakly-annotated pneumothorax cases to train the Tiramisu models with SLSR loss (Eq. 4).

The experimental results are shown in Table 2. As can be found, more involvement of well-annotated data will improve the segmentation performance. Meanwhile, for each group of experiments, it can be observed from Table 2 that if the numbers of weakly- and well-annotated data are close, the best synergy can be achieved. In particular, for the case of adding of 200 weakly-annotated data to the 300 well-annotated group, the best segmentation performance such as 0.637 IoU can be achieved, which is close to 0.640 IoU achieved by the Tiramisu model in the SOTA experiment. Meanwhile, the models with 200–800 weakly-annotated data in the 400 well-annotated group can outperform the model with only 400 well-annotated data, whereas the best performance can be achieved by the model with 400 weakly-annotated data and 400 well-annotated data (0.669 IoU). We also conduct an experiment for the setting of \(\varepsilon \) with the results given in Table 4, where \(\varepsilon =0\) equals to the vanilla cross-entropy loss. We can see the setting of \(\varepsilon =0.1\) can achieve the best performance.

Table 4. Results of different settings for \(\varepsilon \) value.
Fig. 4.
figure 4

Visualization of results.

Visualization Results. The intermediate results of three testing data with the different degree of pneumothorax severity are shown in Fig. 4. As can be found, the segmentation results with only 200 well-annotated data (without including weakly-annotated data) are not very promising. However, with the inclusion of more well-annotated data and even number of weakly-annotated data, better performance can be achieved. Meanwhile, the attention maps of classification model with image-level annotations are also shown in Fig. 4. The attention maps are very rough and have too many errors to precisely segment and measure the pneumothorax for the application of CXR triage. More examples can be found in supplementary materials.

4 Conclusion

A novel spatial label smoothing regularization method has been developed to explore the uncertainty of weakly-annotated data in the weakly supervised segmentation framework. As can be observed in Table 3 and Fig. 4, the proposed method can relieve the need for well-annotated data by achieving competitive performance. The proposed method has been evaluated on the difficult pneumothorax segmentation problem in CXR images with extensive experiments.