February 6, 2025

Health Benefit

Healthy is Rich, Today's Best Investment

Mining multi-center heterogeneous medical data with distributed synthetic learning

Mining multi-center heterogeneous medical data with distributed synthetic learning

The study and results presented in this study comply with relevant ethical regulations and follow appropriate ethical standards in conducting research regarding the treatment of human subjects.

Data collection and processing

We collected three categories of datasets described in Table 1 to evaluate our method: (1) multi-center cardiac computed tomography angiography (CTA); (2) multi-modality brain magnetic resonance imaging (MRI); (3) multi-organ histopathology images. The data heterogeneity lies in several aspects, including the number of samples, acquisition scanners, resolutions, geographic locations, modality (the missing-modality setting), and organs (the histopathology dataset). Supplementary Figs. 6, 7, and 8 show differences of some data samples among multiple centers.

For the Cardiac CTA data, we collected three public cardiac CTA datasets acquired from globally different institutes: the Multi-Modality Whole Heart Segmentation (MM-WHS) challenge dataset64,65,66, Automated Segmentation of Coronary Arteries (ASOCA) challenge 2020 dataset67,68, and MICCAI Coronary Artery Tracking Challenge 2008 (CAT08) dataset69. The heterogeneity of scanners and radiology protocols result in various range of voxel spacing and image quality. We only use the CTA data in the MM-WHS dataset, and denote this subset as WHS dataset in Table 1. The WHS data have manually annotated labels of seven whole heart substructures. We generated the annotations of the same substructures for CAT08 and ASOCA datasets by using a state-of-the-art whole heart segmentation algorithm70 in the SenseCare research platform71 and manually correcting gross errors. All the cardiac CTA data were resampled to isotropic 0.8 mm resolution. We used 200 and 1000 as the window level and width to transfer the Hounsfield units to intensity values in our experiments.

For the brain tumor MR images, we used 210 studies of glioblastoma (GBM) from the Brain Tumor Segmentation Challenge 2018 (BraTS18) training dataset72,73,74. The multi-modal MRI datasets were acquired with different clinical protocols and various scanners from 19 different institutions including the Center for Biomedical Image Computing and Analytics (CBICA), the Cancer Imaging Archive (TCIA), and other contributors (OTHER). We used 168 for training and validation and 42 for testing. Each case comprises four MRI modalities, including native (T1), T1 with gadolinium enhancing contrast (T1c), T2-weighted (T2), and T2 Fluid Attenuated Inversion Recovery (FLAIR). The ground truth annotation contains three types of tumor sub-regions including tumor core, enhancing tumor, and edema. All modalities have been aligned to a common space and resampled to 1mm isotropic resolution74.

For the histopathology images, we used the multi-organ nuclei image dataset (Nuclei)75. Its public training set contains 30 digital microscopic tissue images from 30 patients and about 22,000 annotated nuclear boundaries in total (including both epithelial and stromal nuclei). These images of size 1000 × 1000 came from 18 different hospitals spanning seven organs. We selected four organs, the breast, kidney, liver, and prostate, to form a temporal dataset for evaluating continuous learning. Each dataset at a time point contains data from one of the organs. In our experiment, the training set of each center has 4 images from one organ. The testing set has 2 images per organ. In the preprocessing step76, we first performed color normalization77 for all images. Then, each image was divided into 16 (4 × 4) overlapping tiles of size 286 × 286 to form the dataset in the experiment. Therefore, the training set has 64 images in each simulated data center and the testing set has 64 distinct image samples from different organs. In the training of the segmentation model, we used a tile size of 256 × 256, which is the same size as the input and output of the generator in DSL.

Network architecture

Our proposed DSL is comprised of only one central generator and multiple distributed discriminators located in different local nodes. An overview of the proposed architecture is shown in Fig. 1. The central generator, denoted as G, takes task-specific inputs (e.g., segmentation masks in our use case) and generates synthetic images to fool the discriminators. Let N denote the number of participating entities that collaborate in the learning framework, and \(\mathbbS_j=\(\bfx_i^\,j,\bfy_i^\,j)\\) denote the local private dataset of size \(| \mathbbS_j|\) at the j-th entity, where x is an auxiliary variable representing annotation, such as a class label or segmentation mask, y is the corresponding real image data, and \(i\in \ \\) is the sample index. The local discriminators, denoted as Dj, j 1, . . . , N, learn to differentiate between the local real images \(\bfy_i^\,j\) and the synthetic images \(\hat\bfy_i^\,j=G(\bfx_i^\,j)\) generated from G based on \(\bfx_i^\,j\). Our architecture ensures that Dj deployed in the j-th medical entity only has access to its local dataset while not sharing any real image data outside the entity. Only synthetic images, annotations, and losses are transferred between the central generator and the distributed discriminators during the learning process.

Central generator

For segmentation tasks, the central generator is designed to generate images based on input masks so that the synthetic image and corresponding mask can be used as a pair to train a segmentation model. Here, an encoder-decoder ResNet52, is adopted for G. It consists of nine residual blocks78, two stride-2 convolutions for downsampling, and two transposed convolutions for upsampling. All non-residual convolutional layers are followed by batch normalization79 and the ReLU activation. All convolutional layers use 3 × 3 kernels except the first and last layers that use 7 × 7 kernels.

Distributed discriminators

In our framework, each discriminator has the same structure as that in PatchGAN52. The discriminator classifies each of the overlapping patches of the input image as real or fake. Such architecture assumes patch-wise independence of pixels in a Markov random field fashion52,80, and the patch is large enough (70 × 70) to capture the difference in geometrical structures such as background and tumors.

The generator can learn the joint distribution of multiple isolated datasets through adversarial learning. Then, it can be used as an image provider to generate training samples for some downstream tasks. Assuming the distribution of synthetic images, \(p_\hat\bfy\), is the same or similar to that of the real images, pdata, we can generate one large unified dataset, which approximately equals to the union of all the datasets in medical entities. In this way, all private image data from each entity are utilized without sharing. To evaluate the synthetic images, we use the generated samples in segmentation tasks to illustrate the effectiveness of the proposed DSL.

Objective function

The DSL is based on the conditional GAN81. The objective function is:

$$\beginarrayrlr&\mathop\min \limits_G\mathop\max \limits_D_1:D_NV(D_1:N,G)&\\ &=\mathop\sum\limits_j\in [N]\boldsymbol\pi _j\left\{\mathbbE_\bfx \sim s_j(\bfx)\left[\mathbbE_{\bfy \sim p_\rmdata(\bfy| \bfx)}\log D_j(\bfy| \bfx)\right.\right.\\ &+\left.\left.\mathbbE_{\hat\bfy \sim p_\hat\bfy(\hat\bfy| \bfx)}\log (1-D_j(\hat\bfy| \bfx))\right]\right\}\endarray$$

(1)

The goal of Dj is to maximize Eq. (1), while G minimizes it. In this way, the learned G(x) with maximized D(G(x)) can approximate the real data distribution pdata(yx) and D cannot tell ‘fake’ data from real. x follows a distribution s(x). In this paper, We assume that the joint distribution \(s(\bfx)=\mathop\sum \nolimits_j=1^N\boldsymbol\pi _js_j(\bfx)\), where sj(x) is marginal distribution of j-th dataset and πj represents the prior distribution. In the experiment, we set sj(x) to be a uniform distribution and \(\boldsymbol\pi _j\propto | \mathbbS_j|\), resulting in a uniform distribution s(x). For each sub-distribution, there is a corresponding discriminator Dj which only receives data generated from prior sj(x). Similar to previous works52,82, we incorporate noises by using Dropout83 at several layers of the generator G in both training and inference, instead of providing a Gaussian noise as input to the generator.

The losses of Dj and G are defined in Eq. (2) and Eq. (3), respectively.

$$L_D_j=\frac1m\mathop\sum \limits_i=1^m\left[-\log D_j(\bfy_i^\,j| \bfx_i)-\log (1-D_j({\hat\bfy}_i^\,j| {{{{\bfx}}}}_i))\right],$$

(2)

$$L_G=\frac1{Nm\sum {{\boldsymbol\pi }}_j}\mathop\sum \limits_j=1^N{{{{{{{{\boldsymbol\pi }}}}}}}}_j\mathop\sum \limits_i=1^m[\log (1-D_j({\hat\bfy}_i^\,j| {{{{{{{{\bfx}}}}}}}}_i))+\lambda _1L_1(\bfy_i^\,j,{\hat\bfy}_i^\,j)+\lambda _2L_P(\bfy_i^\,j,{\hat{{{{{{\bfy}}}}}}}_i^\,j)].$$

(3)

where m is the minibatch size. The LG contains perceptual loss (LP)84 and L1 loss besides the adversarial loss. In this study, G and Dj are not on the same server and thus Eq. (3) needs to be split into two parts Eq. (4) and Eq. (5) in order to back-propagate the losses to G.

$$L_G_j=\frac1m\mathop\sum \limits_i=1^m[\log (1-D_j({\hat{{{{{{{{\bfy}}}}}}}}}_i^\,j| {{{{{{{{\bfx}}}}}}}}_i))+\lambda _1L_1({{{{{{{{\bfy}}}}}}}}_i^\,j,{\hat{{{{{{{{\bfy}}}}}}}}}_i^\,j)+\lambda _2L_P({{{{{{{{\bfy}}}}}}}}_i^\,j,{\hat{{{{{{{{\bfy}}}}}}}}}_i^\,j)].$$

(4)

$$\nabla _{\hat{{{{{{{{\bfy}}}}}}}}}=\frac1{N\sum {{{{{{{{\boldsymbol\pi }}}}}}}}_j}\mathop\sum \limits_j=1^N{{{{{{{{\boldsymbol\pi }}}}}}}}_j[\nabla _{{\hat{{{{{{{{\bfy}}}}}}}}}^\,j}],$$

(5)

where \(\nabla _{{\hat{{{{{{{{\bfy}}}}}}}}}^\,j}=\partial L_G_j/\partial {\hat{{{{{{{{\bfy}}}}}}}}}^\,j\) is computed at node Dj based on loss in Eq. (4) and then sent back to G for aggregation (Eq. (5)). The learning process is summarized in Supplementary Algorithm 1. We trained 200 epochs for all tasks and updated each discriminator once in each training iteration. The gradient-based updates can adopt different gradient-based learning rules. We used Adam optimizer85 with a learning rate of 0.0002 in our experiments.

Extension for multi-modality datasets

For a use case of multi-modality data, assuming c modalities, the local data center j has a set of multi-modality image \(y_i^\;j=({{{{{{{{\bfy}}}}}}}}_i,1^\,j,…,{{{{{{{{\bfy}}}}}}}}_i,c^\,j)\) associated with each label image \({{{{{{{{\bfx}}}}}}}}_i^\,j\). A simple way of handling the multi-modality image in our framework would be treating the c modalities of one sample as a c-channel image. Thus the only change needed is the number of channels of the input layer of D and an output layer of G. In this setting, the learning task of D could be easier and converge very fast since different modalities have different contrast patterns, and more information can be used to differentiate the real and the ‘fake’ data. However, the task of G may become more challenging to learn. It is because, on one hand, the G needs to learn more complex data distribution to generate multiple modalities with different contrasts. On the other hand, the easily-learned D may learn some trivial discriminative features and thus cannot provide helpful feedback to G to guide its learning.

To balance the task difficulty of the G and D’s, we extend our framework by deploying multiple discriminators at each entity. Every single modality has its discriminator in one data center, and the G receives losses from the multiple Ds for a multi-modality data sample. In this way, each D can focus on learning discriminative features for one specific modality and provide more meaningful feedback to G. The objective function can be extended from Eq. (1) as:

$$\beginarrayrlr&\mathop\min \limits_G\mathop\max \limits_D_1:N^1:cV(D_1:N^1:c,G)&\\ &=\mathop\sum\limits_j\in [N]{{{{{{{{\boldsymbol\pi }}}}}}}}_j\left\{\mathbbE_{{{{{{{{\bfx}}}}}}} \sim s_j({{{{{{{\bfx}}}}}}})}\mathop\sum \limits_k=1^c\left[\mathbbE_{{{{{{{{{\bfy}}}}}}}}_k \sim p_{{{{{{{\rmdata}}}}}}}({{{{{{{{\bfy}}}}}}}}_k| {{{{{{{\bfx}}}}}}})}\log D_j,k({{{{{{{{\bfy}}}}}}}}_k| {{{{{{{\bfx}}}}}}})\right.\right.\\ &+\left.\left.\mathbbE_{{\hat{{{{{{{{\bfy}}}}}}}}}_k \sim p_{\hat{{{{{{{{\bfy}}}}}}}}}({\hat{{{{{{{{\bfy}}}}}}}}}_k| {{{{{{{\bfx}}}}}}})}\log (1-D_j,k({\hat{{{{{{{{\bfy}}}}}}}}}_k| {{{{{{{\bfx}}}}}}}))\right]\right\},\endarray$$

(6)

where Dj,k represents the discriminator for the k-th modality at the center j.

Besides, another advantage of the proposed multi-modality framework is that it enables learning from missing modality data. Let Cj denote the set of index of available modality for center j, if data center j misses the k-th modality for example, then Cj = 1, . . . , k − 1, k + 1, . . . , c. In this case, center j only needs to deploy c − 1 discriminators during the learning. The learning process has no difference except that it only collects losses of available discriminators for Cj to update the G and only use a subset of the synthetic images \(\{{\hat{{{{{{{{\bfy}}}}}}}}}_k^\,j| k\in C_j\}\) to update the corresponding Dj,kkCj in center j. Because the discriminators for different modalities in different entities are all independent, the G can still learn to generate all modalities, assuming that the missing modality in one center is available in some other data centers. The loss function of D is the same, while the loss function of G can be adjusted as the following:

$$L_G= \frac1{Nm\sum {{{{{{{{\boldsymbol\pi }}}}}}}}_j}\mathop\sum \limits_j=1^N {{{{{{{{\boldsymbol\pi }}}}}}}}_j\mathop\sum \limits_i=1^m\mathop\sum\limits_k\in C_j\left[\log (1-D_j,k({\hat{{{{{{{{\bfy}}}}}}}}}_i,k^\,j| {{{{{{{{\bfx}}}}}}}}_i)) \right. \\ +\left.\lambda _1L_1({{{{{{{{\bfy}}}}}}}}_i,k^\,j,{\hat{{{{{{{{\bfy}}}}}}}}}_i,k^\,j)+\lambda _2L_P({{{{{{{{\bfy}}}}}}}}_i,k^\,j,{\hat{{{{{{{{\bfy}}}}}}}}}_i,k^\,j)\right].$$

(7)

After training, the learned G can act as a synthetic image provider to generate multi-modality images from the conditional variable, a mask image. As a result, it can also be used for missing modality completion. For instance, if a data center has data (y1, . . . , yk−1, yk+1, yc) with the k-th modality missing and the corresponding mask image x, we can use the synthetic image at the k-th channel of G(x) as a substitute. Our approach is different from the existing methods that predict the target modality from another modality53,86 in the sense that it can generate multiple modalities to handle randomly missing modality problems, and thus does not require a specific model for specific modality pair for the input and output.

Extension for temporal datasets

Another variation of DSL contains a central generator and multiple distributed temporary discriminators located in data centers. Suppose the training starts at time t − 1 with Kt−1 online local data centers. The central generator Gt−1 learns the distribution of all online inputs and outputs synthetic images. The local discriminators, \(\D_t-1^1,\ldots,D_t-1^K_t-1\\) learn to identify the synthetic images from the local real images. At time t, the new data centers are online and the real data and discriminators of t − 1 are no longer available. The central generator Gt tries to learn the distribution of new data and retain the mixture distribution learnt from previous data. The learning of new data is achieved by a digesting loss and the memory of previously learnt knowledge is kept by using a reminding loss.

We assume the conditional distribution is consistent over time. The loss function of TDGAN consists of two parts:

$$V_t(G_t,D_t^1:K_t) =\mathop{{{\mathrmmin}} }\limits_G_t\ L_{{\rmDigesting}}+\lambda \cdot L_{{\rmReminding}}\\ \,\mboxDigesting Loss\,:L_{{{{{{{{\rmDigesting}}}}}}}} \mathop=\limits^\Delta \mathop{{{\mathrmmax}} }\limits_D_t^1:K_t\mathop\sum \limits_j=1^K_t{{{{{{{{\boldsymbol\pi }}}}}}}}_t^\,j\mathbbE_{{{{{{{{\bfx}}}}}}} \sim s_t^\,j({{{{{{{\bfx}}}}}}})}\left\{\mathbbE_{{{{{{{{\bfy}}}}}}} \sim p_{{{{{{{{\rmdata}}}}}}}}({{{{{{{\bfy}}}}}}}| {{{{{{{\bfx}}}}}}})}[\log D_t^\,\,j({{{{{{{\bfy}}}}}}}| {{{{{{{\bfx}}}}}}})]\right. \\ \,\,\,\,\,\,+ \left.\mathbbE_{{\hat{{{{{{{{\bfy}}}}}}}}}_j \sim p_{\hat{{{{{{{{\bfy}}}}}}}}}({\hat{{{{{{{{\bfy}}}}}}}}}_t^\,j| {{{{{{{\bfx}}}}}}})}[\log (1-D_t^\,\,j(G_t({{{{{{{\bfx}}}}}}})| {{{{{{{\bfx}}}}}}}))]\right\}\\ \,\mboxReminding Loss\,:L_{{{{{{{{\rmReminding}}}}}}}} \mathop=\limits^\Delta \mathbbE_{{{{{{{{\bfx}}}}}}} \sim s_t-1({{{{{{{\bfx}}}}}}})}{{\mathbbE}}_{\hat{{{{{{{{\bfy}}}}}}}} \sim p_{\hat{{{{{{{{\bfy}}}}}}}}}(\hat{{{{{{{{\bfy}}}}}}}}| {{{{{{{\bfx}}}}}}})}[\parallel G_t({{{{{{{\bfx}}}}}}})-G_t-1({{{{{{{\bfx}}}}}}})\parallel ^2]$$

(8)

The digesting loss, LDigesting, utilizes the mixture cross-entropy loss term to supervise the generator to learn from the new data at time t. The reminding loss, LReminding, is formulated as a squared norm loss to enforce the generator to memorize the learned distribution of past data.

Distributed FID for image quality measurement

The Frechet Inception Distance (FID)32 has been widely used to evaluate the image qualify by calculating the distance between the statistics of feature vectors of the real and generated images. The definition of FID is:

$${{{{\rmFID}}}}=| | \boldsymbol\mu _1-\boldsymbol\mu _2| ^2+{\rmTr}(\boldsymbol\sigma _1+\boldsymbol\sigma _2-2*\sqrt{\boldsymbol\sigma _1*{{{{{\boldsymbol\sigma }}}}}_2}),$$

(9)

where μ1 and μ2 refer to the feature-wise mean of the real and generated images, σ1 and σ2 are the covariance matrices for the real and generated feature vectors, Tr refers to the trace operation in linear algebra.

Though FID is an ideal metric to find the best model when training a GAN26,87, we are unable to compute one FID score in distributed learning because a joint set of the isolated real data does not exist. Therefore, we propose a new metric named distributed FID (DistFID) to calculate the weighted average distance between each real dataset and the synthetic database. The DistFID is defined as:

$${{{{\rmDistFID}}}}=\mathop\sum \limits_j^N{{\bfw}}_j(| | {{{{{{{\boldsymbol\mu }}}}}}}_1^\,j-{{{{{{{{\boldsymbol\mu }}}}}}}}_2| ^2+{{{{{{{\rmTr}}}}}}}({{{{{{{{\boldsymbol\sigma }}}}}}}}_1^\,j+{{{{{{{{\boldsymbol\sigma }}}}}}}}_2-2 * \sqrt{{{{{{{{{\boldsymbol\sigma }}}}}}}}_1^\,j * {{{{{{{{\boldsymbol\sigma }}}}}}}}_2}))$$

(10)

in which each of the N entities host a dataset \(\mathbbS_j\) of size \(| \mathbbS_j|\) with feature statistics \(({{{{{{{{\boldsymbol\mu }}}}}}}}_1^\,j,{{{{{{{{\boldsymbol\sigma }}}}}}}}_1^\,j)\). The weight of each center \({{{{{{{{\bfw}}}}}}}}_j=| \mathbbS_j| /\mathop\sum \nolimits_j=1^N| {{\mathbbS}}_j|\). At the beginning of training DSL, each client center sends the feature-wise statistics (\({{{{{{{{\boldsymbol\mu }}}}}}}}_1^\,j\) and \({{{{{{{{\boldsymbol\sigma }}}}}}}}_1^\,j\)) to the central center. Then, the central center can use the synthetic images and compute the DistFID value based on Eq. (10) to evaluate the generator. We validated the consistency between the FID and DistFID scores in Supplementary Fig. 2.

Learning of downstream task

In this study, we used segmentation as the downstream machine-learning task and also evaluated a classification task. After obtaining a well-learned image generator from the DSL, we can generate synthetic medical images from mask (label) images. In our experiments, to fairly compare the effect of the synthetic images with the real samples, we adopted the same U-Net88 as the segmentation model and VGG34 network as the classification model to learn on different sets of 2D images. During training the downstream task, we withheld 20% samples from the training data as the validation set to select the model with the best Dice score to test. We used Adam optimizer with a learning rate of 0.01 to learn segmentation in our experiments. For the cardiac CTA and brain MRI segmentation tasks, a combination of cross-entropy (CE) and Dice was used as the loss function. For the nuclear segmentation task, CE loss was used. For the classification task, the binary cross-entropy loss was used. We inferred every 2D image in testing, and for the cardiac CTA and brain MRI data we computed the 3D metrics by stacking up the 2D images for the same subject, which are reported in Results. Note that, the reported standard deviations in the Results section were computed with the degrees of freedom equaling the number of samples.

Quantitative metrics

The Dice score (Dice) and 95% quantile of Hausdorff distance (HD95) are adopted to evaluate the segmentation performance on cardiac CTA and BraTS1874. The Dice score measures the overlap between ground-truth mask \(\mathcalG\) and segmented result \(\mathcalS\). It is defined as

$${{{{\rmDice}}}}(\mathcalG,\mathcalS)=\frac2 \mathcalS$$

(11)

The Hausdorff Distance (HD) evaluates the distance between boundaries of ground-truth and segmented masks:

$${{{{\rmHD}}}}(\mathcalG,\mathcalS)=\max \{\mathop\sup \limits_\bfu\in \partial \mathcalGd(\bfu,\partial \mathcalS),\mathop\sup \limits_{\bfv\in \partial \mathcalS}d(\bfv,\partial \mathcalG)\}$$

(12)

where ∂ means the boundary operation, \(d(\bfu,\partial \mathcalS)=\mathop\inf \nolimits_{\bfv\in \partial {{{\mathcalS}}}}\parallel {{{{{{{\bfu}}}}}}}-{{{{{{{\bfv}}}}}}}\parallel _2\) is minimum distance from vertex u to surface \(\partial {{{{{{{\mathcalS}}}}}}}\) and \(\sup\) represents the supremum and \(\inf\) the infimum. Because the Hausdorff distance is sensitive to outliers in \(\mathcalG\) or \({{{{{{{\mathcalS}}}}}}}\), we use the 95% quantile Hausdorff distance (HD95):

$${{{{\rmHD95}}}}(\mathcalG,{{{{{{{\mathcalS}}}}}}})=\max \{\mathop\sup\limits_{{{{{{\bfu}}}}}\in \partial \mathcalG}^95d({{{{{{{\bfu}}}}}}},\partial {{{{{{{\mathcalS}}}}}}}),\mathop\sup \limits_{{{{{{{{\bfv}}}}}}}\in \partial {{{{{{{\mathcalS}}}}}}}}^95d({{{{{{{\bfv}}}}}}},\partial {{{{{{{\mathcalG}}}}}}})\},$$

(13)

where the \(\sup ^95\) is the 95%-th maximum value. In addition, we report the average Surface Distance (SD) as follows:

$${{{{\rmSD}}}}({{{{{{{\mathcalG}}}}}}},{{{{{{{\mathcalS}}}}}}})=\frac12\left\{\frac1{| \partial {{{{{{{\mathcalG}}}}}}}| }\mathop\sum\limits_{{{{{{{{\bfu}}}}}}}\in \partial {{{{{{{\mathcalG}}}}}}}}d({{{{{{{\bfu}}}}}}},\partial {{{{{{{\mathcalS}}}}}}})+\frac1{| \partial {{{{{{{\mathcalS}}}}}}}| }\mathop\sum\limits_{{{{{{{{\bfv}}}}}}}\in \partial {{{{{{{\mathcalS}}}}}}}}d({{{{{{{\bfv}}}}}}},\partial {{{{{{{\mathcalG}}}}}}})\right\}.$$

(14)

For nuclei segmentation, we utilize the object-level Dice89 and the Aggregated Jaccard Index (AJI)75:

$${{{{\rmAJI}}}}({{{{{{{\mathcalG}}}}}}},{{{{{{{\mathcalS}}}}}}})=\frac{\mathop\sum \nolimits_i=1^{n_{{{{{{{{\mathcalG}}}}}}}}}| {{{{{{{{\mathcalG}}}}}}}}_i\cap {{{{{{{\mathcalS}}}}}}}({{{{{{{{\mathcalG}}}}}}}}_i)| }{\mathop\sum \nolimits_i=1^{n_{{{{{{{{\mathcalG}}}}}}}}}| {{{{{{{{\mathcalG}}}}}}}}_i\cup {{{{{{{\mathcalS}}}}}}}({{{{{{{{\mathcalG}}}}}}}}_i)|+\sum _{k\in {\mathcalO}}| {{{{{{{{\mathcalS}}}}}}}}_k| }$$

(15)

where \(n_{{{{{{{{\mathcalG}}}}}}}}\) is the number of ground-truth objects in \({{{{{{{\mathcalG}}}}}}}\), \({{{{{{{\mathcalS}}}}}}}({{{{{{{{\mathcalG}}}}}}}}_i)\) represents the segmented object that has maximum overlap with \({{{{{{{{\mathcalG}}}}}}}}_i\) with regard to the Jaccard index, and \({{{{{{{\mathcalO}}}}}}}\) is the set containing segmentation objects that have not been assigned to any ground-truth object.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

link

Leave a Reply

Your email address will not be published. Required fields are marked *

Copyright © All rights reserved. | Newsphere by AF themes.