Advertisement

Detection of microcalcifications in photon-counting dedicated breast-CT using a deep convolutional neural network: Proof of principle

Open AccessPublished:December 27, 2022DOI:https://doi.org/10.1016/j.clinimag.2022.12.006

      Highlights

      • Breast-CT is a new imaging modality, however, the lack in data can be compensated using data from mammographies for training
      • A dCNN trained with mammographic images can detect microcalcifications on breast-CT images with high accuracy.
      • Healthy tissue is often misclassified as suspicious calcifications, requiring a second reading by a radiologist for accurate diagnosis.

      Abstract

      Objective

      In this study, we investigate the feasibility of a deep Convolutional Neural Network (dCNN), trained with mammographic images, to detect and classify microcalcifications (MC) in breast-CT (BCT) images.

      Methods

      This retrospective single-center study was approved by the local ethics committee. 3518 icons generated from 319 mammograms were classified into three classes: “no MC” (1121), “probably benign MC” (1332), and “suspicious MC” (1065). A dCNN was trained (70% of data), validated (20%), and tested on a “real-world” dataset (10%). The diagnostic performance of the dCNN was tested on a subset of 60 icons, generated from 30 mammograms and 30 breast-CT images, and compared to human reading. ROC analysis was used to calculate diagnostic performance. Moreover, colored probability maps for representative BCT images were calculated using a sliding-window approach.

      Results

      The dCNN reached an accuracy of 98.8% on the “real-world” dataset. The accuracy on the subset of 60 icons was 100% for mammographic images, 60% for “no MC”, 80% for “probably benign MC” and 100% for “suspicious MC”. Intra-class correlation between the dCNN and the readers was almost perfect (0.85). Kappa values between the two readers (0.93) and the dCNN were almost perfect (reader 1: 0.85 and reader 2: 0.82). The sliding-window approach successfully detected suspicious MC with high image quality. The diagnostic performance of the dCNN to classify benign and suspicious MC was excellent with an AUC of 93.8% (95% CI 87, 4%–100%).

      Conclusion

      Deep convolutional networks can be used to detect and classify benign and suspicious MC in breast-CT images.

      Abbreviations:

      BC (breast cancer), BCT (breast computed tomography), DCIS (ductal carcinoma in situ), dCNN (deep Convolutional Neural Network), MC (microcalcifications)

      Keywords

      1. Introduction

      With an estimated 2.3 million new cases per year, breast cancer (BC) constitutes the most common cause of cancer death in the female population (15.5%).
      • Sung H.
      • Ferlay J.
      • Siegel R.L.
      • et al.
      Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries.
      With the implementation of quality-controlled mammography screening programs in numerous countries, the number of women diagnosed with suspicious alterations in breast tissue has increased significantly. The main reason for the increased detection rate of such malignant processes is microcalcifications (MC), which are responsible for the detection of 85–95% of cases of ductal carcinoma in situ (DCIS) during mammography screening.
      • Stomper P.C.
      • Geradts J.
      • Edge S.B.
      • Levine E.G.
      Mammographic predictors of the presence and size of invasive carcinomas associated with malignant microcalcification lesions without a mass.
      Microcalcifications are small calcium compounds, usually in the form of calcium oxalate and calcium phosphate. While calcium phosphate is related to BC, calcium oxalate compounds are most often related to benign processes.
      • Wilkinson L.
      • Thomas V.
      • Sharma N.
      Microcalcification on mammography: approaches to interpretation and biopsy.
      Since Salomon's first report of microcalcifications in a mastectomy specimen in 1913, the role of microcalcifications in the detection of breast cancer has been deeply investigated.
      • Salomon A.
      Beiträge zurpathologie und klinik der mammacarcinomz.
      Although breast radiography has been performed since the 1930s, the clinical use and diagnostic advantage of mammography have not been recognized until the early 1950s when the radiologist Leborgne reported the presence of MC as the only manifestation of malignancy.
      • Bassett L.W.
      • Gold R.H.
      The evolution of mammography.
      • Leborgne R.
      Diagnosis of tumors of the breast by simple roentgenography; calcifications in carcinomas.
      Since the pioneering work of Egan et al. and Gross et al. mammography screening has been regarded as the state of the art for breast cancer detection.
      • Egan R.L.
      Contributions of mammography in the detection of early breast cancer.
      • Strax P.
      • Venet L.
      • Shapiro S.
      • Gross S.
      Mammography and clinical examination in mass screening for cancer of the breast.
      Although widely held, the assessment of mammographic images strongly depends on the physician's experience. While most readers agree on the presence of MC, there is substantial inter-reader variability concerning the interpretation of the distribution and morphology of microcalcifications regarding the diagnosis of breast cancer. The second reading is an established quality control, making screening programs time-consuming and cost-intensive. To standardize the diagnostic assessment and reporting of mammograms, the American College of Radiology (ACR) released the Breast Imaging Reporting and Data System (BI-RADS), also implying the associated risk of malignancy and the corresponding appropriate clinical workup. According to the BI-RADS atlas, calcifications are classified into six categories based on morphology as well as distribution. Benign calcifications are often large (>1 mm), round or “popcorn-like” shaped, and show diffuse distribution. Suspicious MCs in contrast, are more likely amorphous, heterogenous or pleomorphic in their shape and show rather regional, grouped, or segmental distribution. Despite the standardized classification system, the assessment of mammographic MC exhibits low specificity, varying from 10 to 60%. In addition, a false-positive rate of up to 19.7%, particularly caused by superimposed structures, reduces the efficiency of screening programs with high callback rates and unnecessary biopsies.
      • Hofvind S.
      • Ponti A.
      • Patnick J.
      • et al.
      False-positive results in mammographic screening for breast cancer in Europe: a literature review and survey of service screening programmes.
      • Kamangar F.
      • Dores G.M.
      • Anderson W.F.
      Patterns of cancer incidence, mortality, and prevalence across five continents: defining priorities to reduce cancer disparities in different geographic regions of the world.
      Recently, spiral breast-CT has been introduced as a promising 3D breast imaging modality. Providing high isotropic spatial resolution, this non-compression technique offers a serious screening alternative for women refusing conventional mammography due to painful previous experiences. Wetzl et al. reported a significant reduction of pain in women receiving BCT compared to digital mammography in their patient cohort.
      • Berger N.
      • Marcon M.
      • Frauenfelder T.
      • Boss A.
      Dedicated spiral breast computed tomography with a single photon-counting detector: initial results of the first 300 women.
      • Berger N.
      • Marcon M.
      • Saltybaeva N.
      • et al.
      Dedicated breast computed tomography with a photon-counting detector: initial results of clinical in vivo imaging.
      • Wetzl M.
      • Wenkel E.
      • Dietzel M.
      • et al.
      Potential of spiral breast computed tomography to increase patient comfort compared to DM.
      Moreover, BCT reduces the effect of tissue overlay, as observed in superimposed mammograms. As a result, the sensitivity of BCT is reported to be higher compared to mammography at a similar radiation dose, especially in dense breast tissue.
      • Wienbeck S.
      • Uhlig J.
      • Luftner-Nagel S.
      • et al.
      The role of cone-beam breast-CT for breast cancer detection relative to breast density.
      Although recommendations regarding examination parameters based on breast size and tissue compositions have recently been published, optimal reconstruction parameters for all clinical indications are still not established.
      • Germann M.
      • Shim S.
      • Angst F.
      • Saltybaeva N.
      • Boss A.
      Spiral breast computed tomography (CT): signal-to-noise and dose optimization using 3D-printed phantoms.
      Investigations concerning MC in BCT demonstrated the feasibility to detect MC of 150 μm in diameter in BCT using a photon counting detector in a laboratory setting.
      • Kalender W.A.
      • Beister M.
      • Boone J.M.
      • et al.
      High-resolution spiral CT of the breast at very low dose: concept and feasibility considerations.
      However, phantom studies using clinical set-ups reported slightly inferior spatial resolution with MC of 196 μm visible. Clinical investigations regarding MC in mammography have shown that the size of clinically relevant MC typically ranges between 100 and 200 μm.
      • Shim S.
      • Saltybaeva N.
      • Berger N.
      • et al.
      Lesion detectability and radiation dose in spiral breast CT with photon-counting detector technology: a phantom study.
      Therefore BCT might miss very small MC, which conventional mammography can detect. The inferior spatial resolution can be partially compromised by the ability to display calcifications in 3D detail and local reference to other calcifications and the surrounding tissue.
      • O'connell A.M.
      • Karellas A.
      • Vedantham S.
      The potential role of dedicated 3D breast CT as a diagnostic tool: review and early clinical examples.
      Nevertheless, the assessment of MC in BCT is a time-intensive workflow caused by many images in one BCT examination. The limited clinical experience with this new imaging technique and the need for a second reading like conventional mammography furthermore increases the resources required for BCT imaging. The labor-intensive data reading and the requirement for standardization demonstrate that an automatic detection technique alerting the radiologist of the presence of MC in BCT images is highly desirable.
      Machine learning as a part of artificial intelligence has been frequently applied to analyze medical images.
      • Becker A.S.
      • Marcon M.
      • Ghafoor S.
      • et al.
      Deep learning in mammography: diagnostic accuracy of a multipurpose image analysis software in the detection of breast cancer.
      For example, a dCNN has been used to successfully classify MC in mammographic images according to the ACR BI-RADS classification system.
      • Schonenberger C.
      • Hejduk P.
      • Ciritsis A.
      • et al.
      Classification of mammographic breast microcalcifications using a deep convolutional neural network: a BI-RADS-based approach.
      In another study, a dCNN has been applied to spiral breast-CT to classify breast density on a BI-RADS-based density atlas with high accuracy.
      • Landsmann A.
      • Wieler J.
      • Hejduk P.
      • et al.
      Applied machine learning in spiral breast-CT: can we train a deep convolutional neural network for automatic, standardized and observer independent classification of breast density?.
      In this current study we investigate the feasibility of a dCNN, trained with mammographic images, to detect and classify MC in breast-CT images.

      2. Materials and methods

      2.1 Patient selection

      A retrospective analysis of patient data in the local Picture Archiving and Communication System (PACS) of our institution was performed and approved by the local ethics committee. Informed consent was waived for this retrospective study. 319 mammograms from 120 patients from the years 2013–2016 and 42 breast-CT examinations from 21 patients from the years 2018–2020 were included in this study. For breast-CT examinations only slices depicting microcalcifications were used, resulting in a dataset of 1393 images. Both mammographic and BCT images were assessed according to the radiologist's report. Based on the BI-RADS classification system calcifications were categorized into six groups, 1: no calcifications; 2: benign calcifications; 3: probably benign calcifications; 4: suspicious calcifications; 5 high probability of malignancy and 6: biopsy-proven carcinoma.

      2.2 Data preparation

      In the first step, the dimensions of the mammograms were adjusted to 3510 × 2800 pixels. For labeling, a custom-made OCTAVE script (Release 5.2.0) was used. For each mammogram, different icons depicting rectangular regions of interest (ROIs) were manually labeled according to the classes according to the radiologists' report and saved as a new image (351 × 380 pixels). The three classes were defined as no microcalcifications (BI-RADS 1), probably benign microcalcifications (BI-RADS 2/3), and suspicious microcalcifications (BI-RADS 4/5). In Fig. 1 examples of mammographic illustrations of the three classes with magnifications of the corresponding ROIs are depicted. Images were randomly shuffled within their classes and the dataset was split into a training, validation, and test dataset. 70% of the images were used for the training of the dCNN and 20% for the stepwise validation of the model during training. Subsequently, a “real-world” test dataset, including 10% of images, not previously used for training and validation was created. To evaluate the unbiased performance of the dCNN model, a subset of 30 images of the test dataset was created with 10 images of each class (no MC, probably benign MC, suspicious MC).
      Fig. 1
      Fig. 1Mammographic illustrations of the three classes “no microcalcifications” (blue), probably benign microcalcifications (yellow), and “suspicious microcalcifications” (red) with magnifications of the corresponding ROIs. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
      To evaluate the performance of the model for breast-CT images, an additional dataset of 30 images, 10 of each class was created. Therefore, rectangular ROIs (icons) of the BCT images were manually labeled, cropped, and saved as new images (351 × 380 pixels). In Fig. 2 examples of breast-CT images illustrating the three classes with magnifications of the corresponding ROIs are depicted. The generated icons were then mixed with the mammographic images resulting in a subset of 60 icons. The dCNN's classification for the subset was then compared to human reading by two highly experienced radiologists in breast imaging (A.B. 16 years, J.W. 7 years).
      Fig. 2
      Fig. 2Breast-CT illustrations of the three classes “no microcalcifications” (blue), “probably benign microcalcifications” (yellow), and suspicious microcalcifications (red) with magnifications of the corresponding ROIs. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

      2.3 dCNN architecture and training

      All computations were performed on an ASUS desktop computer equipped with an Intel i7-7700 CPU with 16 GB RAM and NVIDIA 2080 RTX graphics processing unit with 8 GB graphics RAM. The desktop PC was running under Ubuntu 20.04 with Tensorflow 1.0.1. and Keras 2.0.4. All programming was performed in the computer programming language Python (Version 3.8.24; Python Software Foundation, Wilmington, DE).
      Fig. 3
      Fig. 3Schematic illustration of the deep Convolutional Neural Network.
      A dCNN model was created to classify the different groups of microcalcifications as described above (BI-RADS 1: no MC, BI-RADS 2/3: probably benign MC, and BIRADS 4/5: suspicious MC). Adam optimizer was used. The dCNN model consisted of 13 convolutional layers followed by max-pooling and two dense layers. The convolutional layers were initialized with zero padding to ensure no reduction in resolution. We used the Nesteroy momentum optimizer for the optimization of the loss function. Moreover, to prevent overfitting dropout with a factor of 0.5 was used as a regularization method. A schematic illustration of the dCNN is depicted in Fig. 3; the detailed dCNN architecture is provided as supplemental material. The batch size was defined to be 40 and the number of epochs was set to 130. After complete training and validation of the model, the images of the “real world” test dataset were classified based on the highest probability assigned to the different categories no MC, probably benign MC, and suspicious MC.

      2.4 Computation of probability maps

      Representative breast-CT images were analyzed using a sliding window approach with a nested loop performing a pixel-wise analysis of the x- and y-position over the complete height and width of the BCT image applying a homemade Python script. Therefore, at each position of the sliding window, a 351 × 280 array was cropped and classified according to the trained dCNN model. The probabilities were determined by the dCNN, and the center coordinates were noted for each position of the sliding window and the corresponding probabilities for each class were stored with numerical values ranging between 0 and 1. Subsequently, the resulting probability array was visualized assigning a heatmap for “suspicious microcalcifications”, as shown in Fig. 4. The approximate computation time for one breast-CT image ranged between 2 and 5 h using the above-described hardware setting.
      Fig. 4
      Fig. 4Two examples of representative breast-CT images (A/D) with calculated “heatmaps” (B/E) and the corresponding overlay images (C/F).

      2.5 Human readout “real-world” subsets

      The subset of 60 icons, 30 mammographic images, and 30 CT images, were presented in random order to the two readers. Both readers were highly experienced in breast imaging (*BLINDED*.:16 years and *BLINDED*: 7 years), particularly in breast-CT imaging (both readers: 4 years). Both readers were blinded to the patient information and the radiologist's report and rated each image individually according to the three classes: no MC (BI-RADS 1), probably benign MC (BI-RADS 2/3), and suspicious MC (BI-RADS 4/5). The classification of both readers was used for the computation of the inter-reader agreement between both readers and the dCNN. The initial classification based on the radiologist's report served as the ground truth and was used to calculate the accuracy of the dCNN.

      2.6 Statistical analyses

      Statistical analysis was performed using the SPSS software package (SPSS version 28, International Business Machines Corp., Armon, NY). The metrics of the confusion matrices on the test dataset were quantified to assess the overall performances of the dCNN as compared to the radiologist's report which served as the ground truth. To assess the inter-reader agreement of the human readout, the intraclass correlation coefficient (ICC) between the dCNN and both readers was calculated. According to Landis and Koch, an ICC >0.80 was considered “almost perfect agreement”.
      • Landis J.R.
      • Koch G.G.
      The measurement of observer agreement for categorical data.
      Interrater reliabilities of the dCNN and both readers were assessed by calculating kappa coefficients. According to Cohen, kappa values of 0.61–0.80 were considered substantial and values of 0.81–0.99 were considered almost perfect.
      • Cohen J.
      Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit.
      The diagnostic performance of the dCNN to detect and classify benign and suspicious MC, compared to the human readout, was assessed by conducting a receiver operating characteristics (ROC) analysis. Diagnostic accuracies were expressed as the area under the curve (AUC) and compared with DeLong's nonparametric test
      • Delong E.R.
      • Delong D.M.
      • Clarke-Pearson D.L.
      Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach.
      between the two readers and the dCNN. All tests were two-tailed and p-values <0.05 were considered significant.

      3. Results

      3.1 Patient cohort

      319 mammograms from 120 patients from the years 2013–2018 and 42 breast-CT examinations from 21 patients from the years 2018–2020 were included in this study. For breast-CT examinations only slices depicting suspicious microcalcifications were used, resulting in a dataset of 1393 images. From manual labeling of rectangular ROIs 3518 icons were created. Demographic data is not routinely documented in the radiologic report or the PACS archive of our institution and was therefore not further analyzed.

      3.2 Training, validation, and “real-world” test datasets

      After image pre-processing, 3518 icons of 319 mammograms were classified into three classes: 1121 images with “no microcalcifications” (32%), 1332 images with “probably benign microcalcifications” (38%), and 1065 images with “suspicious microcalcifications” (30%). Each dataset then was split into 70% training data (2463 images), 20% validation data used during model training (704 images), and 10% “real-world” test data not used for the creation of the model (351 images). The training dataset consisted of 785 images of “no MC”, 932 images of “probably benign MC” and 745 images of “suspicious MC”, whereas the validation dataset consisted of 224 images of “no MC”, 266 images of “probably benign MC” and 213 images “suspicious MC” (Table 1). The “real-world” test dataset of mammographic images not previously used for training nor validation consisted of 112 images of “no microcalcifications” (32%), 132 images of “probably benign microcalcifications” (38%), and 107 images of “suspicious microcalcifications” (30%), resulting in a dataset of 351 images.
      Table 1Number of images used for each dataset.
      ClassTrainingValidationTestTotal
      No mc7852241121121
      Probably benign mc9322661321332
      Suspicious mc7452131071065
      Total24627033513518
      mc = microcalcifications.

      3.3 Model accuracies

      The dCNN reached an accuracy of 98.0% at epoch 118 on the training set and an accuracy of 98.8% at epoch 115 on the validation set. Corresponding loss- and training curves for the training and validation datasets are depicted in Fig. 5. The dCNN reached an accuracy of 98.8% in the test dataset. The confusion matrix of the dCNN for the “real-world” test dataset is shown in Table 2.
      Fig. 5
      Fig. 5Loss- and training curves for the training- and validation-dataset.
      Table 2Confusion matrix of the “real-world” test dataset for the deep Convolutional Neural Network (dCNN). The radiologist's report served as the ground truth.
      Predicted class
      No mcProbably benign mcSuspicious mc
      Ground-truthNo mc111 (99.1%)1 (0.09%)0 (0%)
      Probably benign mc1 (0.08%)131 (99.2%)0 (0%)
      Suspicious mc02 (1.9%)105 (98.1%)
      mc = microcalcifications.

      3.4 Human readout

      Fig. 6 shows example images for mammography and breast-CT for each group and the assigned calculated probabilities. The overall inter-reader agreement between the dCNN and both readers was almost perfect with a calculated intraclass correlation coefficient (ICC) of 0.84 (95% CI 0.77, 0.90). Reader 1 (*BLINDED*, 16 years of experience) showed the best overall agreement with the ground truth (97%), followed by reader 2 (95%) and the dCNN (90%). For mammographic images reader 1 and the dCNN showed perfect agreement with the radiologist's report, which served as the ground truth (100%). Reader 2 (*BLINDED*, 7 years of experience) performed better on icons, generated from breast-CT images (97% agreement), compared to mammographic images (93% agreement). For breast-CT images, the agreement between the dCNN and the ground truth was excellent (80%). Regarding the different classes, the dCNN was able to accurately classify probably benign (80%) and suspicious (100%) microcalcifications but misclassified many icons showing healthy parenchyma as “suspicious mc” (40%). Table 3 shows the confusion matrices for the dCNN and the two readers compared to the ground truth for mammographic and BCT images. Fig. 6 presents example images for mammography and breast-CT images for each group and the probabilities assigned by the dCNN. Inter-reader reliability between the readers (0.93) and the dCNN (reader 1: 0.85, and reader 2: 0.83) was almost perfect. Kappa values between both readers, the dCNN and the radiologist's report, which served as the ground truth, are listed in Table 4. Diagnostic performance for the dCNN was excellent with an area under the receiver operating characteristics curve (AUC) of 93.8% (95% CI: 87.4%–100%); slightly inferior to the two readers (reader 1: 98.8%, reader 2: 97.5%), as depicted in Fig. 7. All statistical comparisons were significant with p-values <0.001 in each case.
      Fig. 6
      Fig. 6Example images for mammography and breast CT for each group and the assigned probabilities.
      Table 3Classification results of the two readers and the deep Convolutional Neural Network (dCNN) compared to the radiologist's report (ground truth) for mammograms (a) and breast-CT images (b).
      Predicted class
      dCNNReader 1Reader 2
      no mcpb mcs mcno mcpb mcs mcno mcpb mcs mc
      a)
      ground-truthno mc10 (100%)0010 (100%)009 (90%)1 (10%)0
      pb mc010 (100%)0010 (100%)009 (90%)1 (10%)
      s mc0010 (100%)0010 (100%)0010 (100%)
      b)
      ground-truthno mc6 (60%)04 (40%)9 (90%)1 (10%)010 (100%)00
      pb mc1 (10%)8 (80%)1 (10%)09 (90%)1 (10%)09 (90%)1 (10%)
      s mc0010 (100%)0010 (100%)0010 (100%)
      mc = microcalcifications; pb = probably benign; s = suspicious.
      Table 4Kappa values between both readers, the deep Convolutional Neural Network (dCNN), and the radiologist's report (ground truth)
      Ground-truthdCNNReader 1Reader 2
      Ground-truth10.850.950.93
      dCNN10.850.82
      Reader 110.93
      Reader 21
      Fig. 7
      Fig. 7Diagnostic performance of the deep Convolutional Neural Network compared to the human readers, depicted as receiver operating characteristics (ROC) curve with the corresponding area under the curve (AUC).

      4. Discussion

      In our study, we were able to show that a deep Convolutional Neural Network (dCNN), trained with mammographic images, can be used to detect and accurately classify microcalcifications in breast-CT images. As breast-CT imaging data is still limited due to the novelty of this modality, we used data from conventional mammographies to train a dCNN, which successfully was applied to analyze breast-CT images. Our study is a proof-of-principle demonstrating that applying the dCNN to BCT examinations might serve as an additional diagnostic tool for the detection of MC and potentially reduces the evaluation time.
      Mammography is the state-of-the-art modality for breast cancer screening. The false positive rate, however, is reported with up to 20%, particularly caused by superimposition effects.
      • Hofvind S.
      • Ponti A.
      • Patnick J.
      • et al.
      False-positive results in mammographic screening for breast cancer in Europe: a literature review and survey of service screening programmes.
      Therefore, large efforts have been carried out by research institutions and the industry to develop a truly three-dimensional breast imaging technique using X-rays. With the introduction of tomosynthesis the effect of overlays has been reduced, yet, the image acquisition in different angles does not provide a full 3D image dataset.
      • O'connell A.M.
      • Karellas A.
      • Vedantham S.
      The potential role of dedicated 3D breast CT as a diagnostic tool: review and early clinical examples.
      Breast MRI, as a truly 3D imaging modality, has the highest sensitivity for the detection of breast cancer and is therefore recommended for breast cancer screening in high-risk patients.
      • Narod S.A.
      MRI versus mammography for breast cancer screening in women with familial risk (FaMRIsc).
      The necessity for contrast media and the inability of depicting microcalcifications are two major disadvantages of breast MRI. Recently, spiral breast CT has been introduced as a promising imaging modality, combining truly 3D imaging and the capability of depicting microcalcifications and soft-tissue masses at the same time.
      Although the shape of MC is widely discussed to be indicative of the presence of malignancy, the assessment of morphologic characteristics of MC can be quite challenging in 2D images like mammograms. With the implementation of breast CT, there was the ambition to reduce the false-positive rate by more precisely assessing the 3D structure of microcalcifications. Unfortunately, the spatial resolution of BCT is too low to resolve their 3D shape. Recently, Kenkel et al. used a mathematical algorithm to assess the 3D structure of MC in histopathological specimens in a micro-CT and ruled out significant association with the B-classification of breast lesions.
      • Kenkel D.
      • Varga Z.
      • Heuer H.
      • et al.
      A micro CT study in patients with breast microcalcifications using a mathematical algorithm to assess 3D structure.
      Latest studies by Brahimetaj et al. investigated radiomics features in histological specimens in a micro-CT. They report a higher power of radiomics features to discriminate benign and malignant microcalcifications compared to morphological features.
      • Brahimetaj R.
      • Willekens I.
      • Massart A.
      • et al.
      Improved automated early detection of breast cancer based on high resolution 3D micro-CT microcalcification images.
      Compared to conventional mammography, which consists of four projection images (two craniocaudal views, and two mediolateral oblique views), one BCT examination can yield up to 3500 images (high- and low-resolution images). Therefore, the evaluation of a BCT dataset is much more time-consuming compared to conventional mammography with the risk that MC present only on a few images might be overlooked. Furthermore, the limited number of radiologists familiar with BCT assessment might add substantial risk if new inexperienced radiologists do the image interpretation. Therefore, there is an urgent need for a post-processing technology offering automatic detection and standardized assessment of MC.
      With the introduction of artificial intelligence in the evaluation of medical images, there have been various investigations concerning the detection of microcalcifications with machine learning algorithms to improve diagnostic performance and/or provide automated image evaluation of mammograms. Wang et al. were among the first to demonstrate that using Convolutional Neural Network classifiers could improve the accuracy of the detection of clustered microcalcifications in mammograms by reducing the high false positive rate observed in traditional computerized detection approaches. Becker et al. demonstrated that deep learning algorithms, designed for generic image analysis, can detect breast cancer presenting as suspicious calcifications with similar accuracy as experienced radiologists.
      • Becker A.S.
      • Marcon M.
      • Ghafoor S.
      • et al.
      Deep learning in mammography: diagnostic accuracy of a multipurpose image analysis software in the detection of breast cancer.
      Recently, Schönenberger et al. trained a dCNN to classify microcalcifications in mammograms with high accuracy.
      • Schonenberger C.
      • Hejduk P.
      • Ciritsis A.
      • et al.
      Classification of mammographic breast microcalcifications using a deep convolutional neural network: a BI-RADS-based approach.
      Latest studies of Stelzer et al. combined texture analysis and machine learning for risk stratification of BI-RADS 4 microcalcifications to prevent unnecessary biopsies in up to 45% of the cases
      • Stelzer P.D.
      • Steding O.
      • Raudner M.W.
      • et al.
      Combined texture analysis and machine learning in suspicious calcifications detected by mammography: potential to avoid unnecessary stereotactical biopsies.
      and Landsmann et al. applied a dCNN on breast-CT images to classify breast density in BCT with high accuracy.
      • Landsmann A.
      • Wieler J.
      • Hejduk P.
      • et al.
      Applied machine learning in spiral breast-CT: can we train a deep convolutional neural network for automatic, standardized and observer independent classification of breast density?.
      Convolutional Neural Networks usually need to be trained with many images to reach sufficient accuracy. Although BCT examinations can reach up to 3500 images, there is only little data available depicting suspicious microcalcifications which may be attributed to the low prevalence of breast cancer in the screening cohort, typically below 1%.
      • Pisano E.D.
      • Gatsonis C.
      • Hendrick E.
      • et al.
      Diagnostic performance of digital versus film mammography for breast-cancer screening.
      In our patient cohort, there were 24 women with reported suspicious microcalcifications in 2240 BCT examinations (1.1%). Since there are only a few installations worldwide and BCT is not recommended in any breast imaging guidelines so far, this data gap might not be filled shortly. To the best of our knowledge, we were the first to investigate the applicability of a dCNN, which was trained with mammographic images, for the analysis of breast-CT data. For our model, the accuracy was 90%. Regarding the different classes, the model correctly detected 100% of “suspicious microcalcifications” and 80% of “probably benign microcalcifications”, while the dCNN misclassified healthy tissue as suspicious in 40% of the cases. Compared to mammograms breast-CT images appear much noisier. This effect was increased by the created magnification icons (Fig. 2), probably causing a high false positive rate in BCT images of 25%. In a “real-world” setting the high false positive rate may be compensated by second reading by the radiologist. Common artifacts in breast-CT are ring artifacts, recently reported by Berger et al.
      • Berger N.
      • Marcon M.
      • Frauenfelder T.
      • Boss A.
      Dedicated spiral breast computed tomography with a single photon-counting detector: initial results of the first 300 women.
      These ring artifacts, observed in every image, did not lead to misclassification of our dCNN model, particularly not in the sliding window approach. Nevertheless, we observed different artifacts in our sliding-window approach. Caused of the rectangular shape of the determined array (351 × 280) the calculated heatmaps also appear in a rectangular shape (Fig. 4). A modern approach to convolutional networks are U-NETs using more precise segmentation for image analysis. The used segmentation algorithm respects the borders of microcalcifications and/or associated lesions more accurately and results in more precise heatmaps.
      There are several limitations of our study. First, this was a single-center study and only a small number of breast-CT images were used to test our dCNN model. Second, only magnification images were used, depicting representative regions of interest. Magnification increased the noise of the images leading to a relatively high false positive rate of 25%. Although the false positive rate for expert readers was only 10%, image noise is also an issue in BCT assessment for radiologists. Third, we only used high-resolution raw data images in the coronal plane. In the “real-world” setting the radiologist can assess the examination in a multiplanar view and/or as post-processed images. However, image analysis using maximum intensity projection (MIP) was out of the scope of this study.
      In conclusion, we were able to demonstrate the feasibility of a dCNN, trained with mammographic images, to detect microcalcifications in breast-CT images. Since this was a proof of principle, future studies will have to show whether a dCNN can be trained with breast-CT images. The aim is to implement deep learning models that reflect interpretation workflows as performed by radiologists.
      The following is the supplementary data related to this article.
      Supplementary Fig. 1

      Funding

      This study was funded by the Clinical Research Priority Program Artificial Intelligence in oncological Imaging of the University of Zurich and the Swiss National Science Foundation SNF Sinergia 183568.

      References

        • Bassett L.W.
        • Gold R.H.
        The evolution of mammography.
        AJR Am J Roentgenol. 1988; 150: 493-498https://doi.org/10.2214/ajr.150.3.493
        • Becker A.S.
        • Marcon M.
        • Ghafoor S.
        • et al.
        Deep learning in mammography: diagnostic accuracy of a multipurpose image analysis software in the detection of breast cancer.
        Invest Radiol. 2017; 52: 434-440https://doi.org/10.1097/rli.0000000000000358
        • Berger N.
        • Marcon M.
        • Frauenfelder T.
        • Boss A.
        Dedicated spiral breast computed tomography with a single photon-counting detector: initial results of the first 300 women.
        Invest Radiol. 2020; 55: 68-72https://doi.org/10.1097/rli.0000000000000609
        • Berger N.
        • Marcon M.
        • Saltybaeva N.
        • et al.
        Dedicated breast computed tomography with a photon-counting detector: initial results of clinical in vivo imaging.
        Invest Radiol. 2019; 54: 409-418https://doi.org/10.1097/rli.0000000000000552
        • Brahimetaj R.
        • Willekens I.
        • Massart A.
        • et al.
        Improved automated early detection of breast cancer based on high resolution 3D micro-CT microcalcification images.
        BMC Cancer. 2022; 22: 162https://doi.org/10.1186/s12885-021-09133-4
        • Cohen J.
        Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit.
        Psychol Bull. 1968; 70: 213-220https://doi.org/10.1037/h0026256
        • Delong E.R.
        • Delong D.M.
        • Clarke-Pearson D.L.
        Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach.
        Biometrics. 1988; 44: 837-845
        • Egan R.L.
        Contributions of mammography in the detection of early breast cancer.
        Cancer. 1971; 28: 1555-1557https://doi.org/10.1002/1097-0142(197112)28:6%3C1555::aid-cncr2820280632%3E3.0.co;2-w
        • Germann M.
        • Shim S.
        • Angst F.
        • Saltybaeva N.
        • Boss A.
        Spiral breast computed tomography (CT): signal-to-noise and dose optimization using 3D-printed phantoms.
        Eur Radiol. 2021; 31: 3693-3702https://doi.org/10.1007/s00330-020-07549-3
        • Hofvind S.
        • Ponti A.
        • Patnick J.
        • et al.
        False-positive results in mammographic screening for breast cancer in Europe: a literature review and survey of service screening programmes.
        J Med Screen. 2012; 19: 57-66https://doi.org/10.1258/jms.2012.012083
        • Hofvind S.
        • Ponti A.
        • Patnick J.
        • et al.
        False-positive results in mammographic screening for breast cancer in Europe: a literature review and survey of service screening programmes.
        J Med Screen. 2012; 19: 57-66https://doi.org/10.1258/jms.2012.012083
        • Kalender W.A.
        • Beister M.
        • Boone J.M.
        • et al.
        High-resolution spiral CT of the breast at very low dose: concept and feasibility considerations.
        Eur Radiol. 2012; 22: 1-8https://doi.org/10.1007/s00330-011-2169-4
        • Kamangar F.
        • Dores G.M.
        • Anderson W.F.
        Patterns of cancer incidence, mortality, and prevalence across five continents: defining priorities to reduce cancer disparities in different geographic regions of the world.
        J Clin Oncol. 2006; 24: 2137-2150https://doi.org/10.1200/jco.2005.05.2308
        • Kenkel D.
        • Varga Z.
        • Heuer H.
        • et al.
        A micro CT study in patients with breast microcalcifications using a mathematical algorithm to assess 3D structure.
        PLoS One. 2017; 12e0169349https://doi.org/10.1371/journal.pone.0169349
        • Landis J.R.
        • Koch G.G.
        The measurement of observer agreement for categorical data.
        Biometrics. 1977; 33: 159-174
        • Landsmann A.
        • Wieler J.
        • Hejduk P.
        • et al.
        Applied machine learning in spiral breast-CT: can we train a deep convolutional neural network for automatic, standardized and observer independent classification of breast density?.
        Diagnostics. 2022; 12 (Basel)https://doi.org/10.3390/diagnostics12010181
        • Leborgne R.
        Diagnosis of tumors of the breast by simple roentgenography; calcifications in carcinomas.
        Am J Roentgenol Radium Ther. 1951; 65: 1-11
        • Narod S.A.
        MRI versus mammography for breast cancer screening in women with familial risk (FaMRIsc).
        Lancet Oncol. 2019; 20e465https://doi.org/10.1016/s1470-2045(19)30489-9
        • O'connell A.M.
        • Karellas A.
        • Vedantham S.
        The potential role of dedicated 3D breast CT as a diagnostic tool: review and early clinical examples.
        Breast J. 2014; 20: 592-605https://doi.org/10.1111/tbj.12327
        • Pisano E.D.
        • Gatsonis C.
        • Hendrick E.
        • et al.
        Diagnostic performance of digital versus film mammography for breast-cancer screening.
        N Engl J Med. 2005; 353: 1773-1783https://doi.org/10.1056/nejmoa052911
        • Salomon A.
        Beiträge zurpathologie und klinik der mammacarcinomz.
        Arch Klin Chir. 1913; 101: 573-668
        • Schonenberger C.
        • Hejduk P.
        • Ciritsis A.
        • et al.
        Classification of mammographic breast microcalcifications using a deep convolutional neural network: a BI-RADS-based approach.
        Invest Radiol. 2021; 56: 224-231https://doi.org/10.1097/rli.0000000000000729
        • Shim S.
        • Saltybaeva N.
        • Berger N.
        • et al.
        Lesion detectability and radiation dose in spiral breast CT with photon-counting detector technology: a phantom study.
        Invest Radiol. 2020; 55: 515-523https://doi.org/10.1097/rli.0000000000000662
        • Stelzer P.D.
        • Steding O.
        • Raudner M.W.
        • et al.
        Combined texture analysis and machine learning in suspicious calcifications detected by mammography: potential to avoid unnecessary stereotactical biopsies.
        Eur J Radiol. 2020; 132109309https://doi.org/10.1016/j.ejrad.2020.109309
        • Stomper P.C.
        • Geradts J.
        • Edge S.B.
        • Levine E.G.
        Mammographic predictors of the presence and size of invasive carcinomas associated with malignant microcalcification lesions without a mass.
        AJR Am J Roentgenol. 2003; 181: 1679-1684https://doi.org/10.2214/ajr.181.6.1811679
        • Strax P.
        • Venet L.
        • Shapiro S.
        • Gross S.
        Mammography and clinical examination in mass screening for cancer of the breast.
        Cancer. 1967; 20: 2184-2188https://doi.org/10.1002/1097-0142(196712)20:12%3C2184::aid-cncr2820201217%3E3.0.co;2-3
        • Sung H.
        • Ferlay J.
        • Siegel R.L.
        • et al.
        Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries.
        CA Cancer J Clin. 2021; 71: 209-249https://doi.org/10.3322/caac.21660
        • Wetzl M.
        • Wenkel E.
        • Dietzel M.
        • et al.
        Potential of spiral breast computed tomography to increase patient comfort compared to DM.
        Eur J Radiol. 2021; 145110038https://doi.org/10.1016/j.ejrad.2021.110038
        • Wienbeck S.
        • Uhlig J.
        • Luftner-Nagel S.
        • et al.
        The role of cone-beam breast-CT for breast cancer detection relative to breast density.
        Eur Radiol. 2017; 27: 5185-5195https://doi.org/10.1007/s00330-017-4911-z
        • Wilkinson L.
        • Thomas V.
        • Sharma N.
        Microcalcification on mammography: approaches to interpretation and biopsy.
        Br J Radiol. 2017; 90: 20160594https://doi.org/10.1259/bjr.20160594