Skip to main content
Log in

Towards Diverse Binary Segmentation via a Simple yet General Gated Network

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

In many binary segmentation tasks, most CNNs-based methods use a U-shape encoder-decoder network as their basic structure. They ignore two key problems when the encoder exchanges information with the decoder: one is the lack of interference control mechanism between them, the other is without considering the disparity of the contributions from different encoder levels. In this work, we propose a simple yet general gated network (GateNet) to tackle them all at once. With the help of multi-level gate units, the valuable context information from the encoder can be selectively transmitted to the decoder. In addition, we design a gated dual branch structure to build the cooperation among the features of different levels and improve the discrimination ability of the network. Furthermore, we introduce a “Fold” operation to improve the atrous convolution and form a novel folded atrous convolution, which can be flexibly embedded in ASPP or DenseASPP to accurately localize foreground objects of various scales. GateNet can be easily generalized to many binary segmentation tasks, including general and specific object segmentation and multi-modal segmentation. Without bells and whistles, our network consistently performs favorably against the state-of-the-art methods under 10 metrics on 33 datasets of 10 binary segmentation tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  • Achanta, R., Hemami, S., Estrada, F., & Süsstrunk, S. (2009). Frequency-tuned salient region detection. In CVPR (pp. 1597–1604).

  • Adelson, E., Anderson, C., Bergen, J., Burt, P., & Ogden, J. (1983). Pyramid methods in image processing. RCA Engineering, 29, 11.

    Google Scholar 

  • Amirul Islam, M., Rochan, M., Bruce, N. D. B., & Wang, Y. (2017). Gated feedback refinement network for dense image labeling. In CVPR (pp. 3751–3759).

  • Arbelaez, P., Maire, M., Fowlkes, C., & Malik, J. (2010). Contour detection and hierarchical image segmentation. IEEE TPAMI, 33, 898–916.

    Article  Google Scholar 

  • Bernal, J., Sánchez, F. J., Fernández-Esparrach, G., Gil, D., Rodríguez, C., & Vilariño, F. (2015). Wm-dova maps for accurate polyp highlighting in colonoscopy: Validation vs. saliency maps from physicians. CMIG, 43, 99–111.

    Google Scholar 

  • Cai, L., Wu, M., Chen, L., Bai, W., Yang, M., Lyu, S., & Zhao, Q. (2022). Using guided self-attention with local information for polyp segmentation. In International conference on medical image computing and computer-assisted intervention (pp. 629–638).

  • Chen, S., & Fu, Y. (2020). Progressively guided alternate refinement network for rgb-d salient object detection. In ECCV (pp. 520–538)

  • Chen, H., & Li, Y. (2018). Progressively complementarity-aware fusion network for rgb-d salient object detection. In CVPR (pp. 3051–3060).

  • Chen, G., Han, K., & Wong, K.-Y. K. (2018a). Tom-net: Learning transparent object matting from a single image. In CVPR (pp. 9233–9241).

  • Chen, Q., Liu, Z., Zhang, Y., Fu, K., Zhao, Q., & Du, H. (2021). Rgb-d salient object detection via 3d convolutional neural networks. In AAAI (pp. 1063–1071).

  • Chen, S., Tan, X., Wang, B., & Hu, X. (2018c). Reverse attention for salient object detection. In ECCV (pp. 234–250).

  • Chen, Z., Xu, Q., Cong, R., & Huang, Q. (2020d). Global context-aware progressive aggregation network for salient object detection. In AAAI (pp. 10599–10606).

  • Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., & Adam, H. (2018b). Encoder-decoder with atrous separable convolution for semantic image segmentation. In ECCV (pp. 801–818).

  • Chen, Z., Cong, R., Qianqian, X., & Huang, Q. (2020c). Dpanet: Depth potentiality-aware gated attention network for rgb-d salient object detection. IEEE TIP, 30, 7012–7024.

    Google Scholar 

  • Chen, H., Deng, Y., Li, Y., Hung, T.-Y., & Lin, G. (2020b). Rgbd salient object detection via disentangled cross-modal fusion. IEEE TIP, 29, 8407–8416.

    Google Scholar 

  • Cheng, Y., Fu, H., Wei, X., Xiao, J., & Cao, X. (2014). Depth enhanced saliency detection method. In ICIMCS (p. 23).

  • Cheng, M., Kong, Z., Song, G., Tian, Y., Liang, Y., & Chen, J. (2021a). Learnable oriented-derivative network for polyp segmentation. In MICCAI (pp. 720–730).

  • Cheng, B., Misra, I., Schwing, A. G., Kirillov, A., & Girdhar, R. (2022a). Masked-attention mask transformer for universal image segmentation. In CVPR (pp. 1290–1299).

  • Cheng, X., Zheng, X., Pei, J., Tang, H., Lyu, Z., & Chen, C. (2022b). Depth-induced gap-reducing network for rgb-d salient object detection: An interaction, guidance and refinement approach. IEEE TMM.

  • Cheng, M.-M., Gao, S.-H., Borji, A., Tan, Y.-Q., Lin, Z., & Wang, M. (2021b). A highly efficient model to study the semantics of salient object detection. IEEE TPAMI, 44, 8006–8021.

    Article  Google Scholar 

  • Cheng, M.-M., Mitra, N. J., Huang, X., Torr, P. H. S., & Shi-Min, H. (2014). Global contrast based salient region detection. IEEE TPAMI, 37, 569–582.

    Article  Google Scholar 

  • Chen, H., & Li, Y. (2019). Three-stream attention-aware network for rgb-d salient object detection. IEEE TIP, 28, 2825–2835.

    MathSciNet  Google Scholar 

  • Chen, H., Li, Y., & Dan, S. (2019). Multi-modal fusion network with multi-scale multi-path and cross-modal interactions for rgb-d salient object detection. Pattern Recognition, 86, 376–385.

    Article  Google Scholar 

  • Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2017). Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE TPAMI, 40, 834–848.

    Article  Google Scholar 

  • Chen, C., Wei, J., Peng, C., Zhang, W., & Qin, H. (2020a). Improved saliency detection in rgb-d images using two-phase depth estimation and selective deep fusion. IEEE TIP, 29, 4296–4307.

    Google Scholar 

  • Chollet, F. (2017). Xception: Deep learning with depthwise separable convolutions. In CVPR (pp. 1251–1258).

  • Cong, R., Sun, M., Zhang, S., Zhou, X., Zhang, W., & Zhao, Y. (2023). Frequency perception network for camouflaged object detection. arXiv preprint arXiv:2308.08924.

  • Cong, R., Lin, Q., Zhang, C., Li, C., Cao, X., Huang, Q., & Zhao, Y. (2022a). Cir-net: Cross-modality interaction and refinement for rgb-d salient object detection. IEEE TIP, 31, 6800–6815.

    Google Scholar 

  • Cong, R., Zhang, Y., Fang, L., Li, J., Zhao, Y., & Kwong, S. (2022b). RRNet: Relational reasoning network with parallel multi-scale attention for salient object detection in optical remote sensing images. IEEE TGRS, 60, 1558–1644.

    Google Scholar 

  • Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., & Schiele, B. (2016). The cityscapes dataset for semantic urban scene understanding. In CVPR (pp. 3213–3223).

  • Cun, X., & Pun, C.-M. (2020). Defocus blur detection via depth distillation. In ECCV (pp. 747–763).

  • Debesh Jha, Pia H Smedsrud, Michael A Riegler, Pål Halvorsen, Thomas de Lange, Dag Johansen, and Håvard D Johansen. Kvasir-seg: A segmented polyp dataset. In MMM, pages 451–462, 2020.

  • Deng, Z., Hu, X., Zhu, L., Xu, X., Qin, J., Han, G., & Heng, P.-A. (2018). R3net: Recurrent residual refinement network for saliency detection. In IJCAI (pp. 684–690).

  • Deng, X., Zhang, P., Liu, W., & Lu, H. (2023). Recurrent multi-scale transformer for high-resolution salient object detection. arXiv preprint arXiv:2308.03826

  • Ding, B., Long, C., Zhang, L., & Xiao, C. (2019). Argan: Attentive recurrent generative adversarial network for shadow detection and removal. In ICCV (pp. 10213–10222).

  • Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S. et al. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929.

  • Dou, Z.-Y., Xu, Y., Gan, Z., Wang, J., Wang, S., Wang, L., Zhu, C., Zhang, P., Yuan, L., Peng, N., et al. (2022). An empirical study of training end-to-end vision-and-language transformers. In CVPR (pp. 18166–18176).

  • Everingham, M., Gool, V., Luc, W., Christopher, K. I., Winn, J., & Zisserman, A. (2010). The pascal visual object classes (voc) challenge. IJCV, 88(2), 303–338.

    Article  Google Scholar 

  • Fan, D.-P., Cheng, M.-M., Liu, Y., Li, T., & Borji, A. (2017). Structure-measure: A new way to evaluate foreground maps. In ICCV (pp. 4548–4557).

  • Fan, D.-P., Gong, C., Cao, Y., Ren, B., Cheng, M.-M., & Borji, A. (2018). Enhanced-alignment measure for binary foreground map evaluation. arXiv preprint arXiv:1805.10421.

  • Fan, D.-P., Ji, G.-P., Sun, G., Cheng, M.-M., Shen, J., & Shao, L. (2020a). Camouflaged object detection. In CVPR (pp. 2777–2787).

  • Fan, D.-P., Ji, G.-P., Zhou, T., Chen, G., Fu, H., Shen, J., & Shao, L. (2020b). Pranet: Parallel reverse attention network for polyp segmentation. In MICCAI (pp. 263–273).

  • Fan, K., Wang, C., Wang, Y., Wang, C., Yi, R., & Ma, L. (2023). Rfenet: Towards reciprocal feature evolution for glass segmentation. arXiv preprint arXiv:2307.06099.

  • Fan, D.-P., Zhai, Y., Borji, A., Yang, J., & Shao, L. (2020d). Bbs-net: Rgb-d salient object detection with a bifurcated backbone strategy network. In ECCV (pp. 275–292).

  • Fang, Y., Chen, C., Yuan, Y., & Tong, K. (2019). Selective feature aggregation network with area-boundary constraints for polyp segmentation. In MICCAI (pp. 302–310).

  • Fang, H., Gupta, S., Iandola, F., Srivastava, R. K., Deng, L., Dollár, P., Gao, J., He, X., Mitchell, M., Platt, J. C., et al. (2015). From captions to visual concepts and back. In CVPR (pp. 1473–1482).

  • Fang, X., He, X., Wang, L., & Shen, J. (2021). Robust shadow detection by exploring effective shadow contexts. In ACM MM (pp. 2927–2935).

  • Fan, D.-P., Ji, G.-P., Cheng, M.-M., & Shao, L. (2021). Concealed object detection. IEEE TPAMI, 44, 6024–6042.

    Article  Google Scholar 

  • Fan, D.-P., Lin, Z., Zhang, Z., Zhu, M., & Cheng, M.-M. (2020c). Rethinking rgb-d salient object detection: Models, data sets, and large-scale benchmarks. IEEE TNNLS, 32, 2075–2089.

    Google Scholar 

  • Feng, M., Lu, H., & Ding, E. (2019). Attentive feedback network for boundary-aware salient object detection. In CVPR (pp. 1623–1632).

  • Fu, K., Fan, D.-P., Ji, G.-P. & Zhao, Q. (2020b). Jl-dcf: Joint learning and densely-cooperative fusion framework for rgb-d salient object detection. In CVPR (pp. 3052–3062).

  • Fu, K., Fan, D.-P., Ji, G.-P., & Zhao, Q. (2020a). Jl-dcf: Joint learning and densely-cooperative fusion framework for rgb-d salient object detection. In CVPR (pp. 3052–3062).

  • Gao, S.-H., Tan, Y.-Q., Cheng, M.-M., Lu, C., Chen, Y., & Yan, S. (2020). Highly efficient salient object detection with 100k parameters. In ECCV (pp. 702–721).

  • Gao, S.-H., Cheng, M.-M., Zhao, K., Zhang, X.-Y., Yang, M.-H., & Torr, P. (2019). Res2net: A new multi-scale backbone architecture. IEEE TPAMI, 43, 652–662.

    Article  Google Scholar 

  • Gu, Y.-C., Gao, S.-H., Cao, X.-S., Du, P., Lu, S.-P., & Cheng, M.-M. (2021). inas: Integral nas for device-aware salient object detection. In ICCV (pp. 4934–4944).

  • Gu, Y., Xu, H., Quan, Y., Chen, W., & Zheng, J. (2023). Orsi salient object detection via bidimensional attention and full-stage semantic guidance. IEEE TGRS, 61, 1–13.

    Google Scholar 

  • Guan, H., Lin, J., & Lau, R. W. H. (2022). Learning semantic associations for mirror detection. In CVPR (pp. 5941–5950).

  • He, H., Li, X., Cheng, G., Shi, J., Tong, Y., Meng, G., Prinet, V., & Weng, L. B. (2021). Enhanced boundary learning for glass-like object segmentation. In ICCV (pp. 15859–15868).

  • He, C., Li, K., Zhang, Y., Tang, L., Zhang, Y., Guo, Z., & Li, X. (2023). Camouflaged object detection with feature decomposition and edge reconstruction. In CVPR (pp. 22046–22055).

  • He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In CVPR (pp. 770–778).

  • Hou, O., Cheng, M.-M., Hu, X., Borji, A., Tu, Z., & Torr, P. H. S. (2017). Deeply supervised salient object detection with short connections. In CVPR (pp. 3203–3212).

  • Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., & Adam, H. (2017). Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861.

  • Hu, J., Shen, L., & Sun, G. (2018). Squeeze-and-excitation networks. In CVPR (pp. 7132–7141).

  • Hu, X., Wang, S., Qin, X., Dai, H., Ren, W., Luo, D., Tai, Y., & Shao, L. (2023a). High-resolution iterative feedback network for camouflaged object detection. In AAAI (Vol. 37, pp. 881–889).

  • Hu, X., Wang, S., Qin, X., Dai, H., Ren, W., Luo, D., Tai, Y., & Shao, L. (2023b). High-resolution iterative feedback network for camouflaged object detection. In AAAI (Vol. 37, pp. 881–889).

  • Hu, X., Zhu, L., Fu, C.-W., Qin, J., & Heng, P.-A. (2018). Direction-aware spatial context features for shadow detection. In CVPR (pp. 7454–7462).

  • Huang, Z., Dai, H., Xiang, T.-Z., Wang, S., Chen, H.-X., Qin, J., & Xiong, H. (2023). Feature shrinkage pyramid for camouflaged object detection with transformers. In CVPR (pp. 5557–5566)

  • Huang, G., Liu, Z., Van Der Maaten, L., & Weinberger, K. Q. (2017). Densely connected convolutional networks. In CVPR (pp. 4700–4708).

  • Huang, Z., Wang, X., Huang, L., Huang, C., Wei, Y., & Liu, W. (2019). Ccnet: Criss-cross attention for semantic segmentation. In ICCV (pp. 603–612).

  • Hu, X., Fu, C.-W., Zhu, L., Qin, J., & Heng, P.-A. (2019). Direction-aware spatial context features for shadow detection and removal. IEEE TPAMI, 42, 2795–2808.

    Article  Google Scholar 

  • Hu, X., Wang, T., Fu, C.-W., Jiang, Y., Wang, Q., & Heng, P.-A. (2021). Revisiting shadow detection: A new benchmark dataset for complex world. IEEE TIP, 30, 1925–1934.

    Google Scholar 

  • Jha, D., Smedsrud, P. H., Johansen, D., de Lange, T., Johansen, H. D., Halvorsen, P., & Riegler, M. A. (2021). A comprehensive study on colorectal polyp segmentation with resunet++, conditional random field and test-time augmentation. IEEE JBHI, 25, 2029–2040.

    Google Scholar 

  • Jia, Q., Yao, S., Liu, Y., Fan, X., Liu, R., & Luo, Z. (2022). Segment, magnify and reiterate: Detecting camouflaged objects the hard way. In CVPR (pp. 4713–4722).

  • Jiang, Z., Xun, X., Zhang, L., Zhang, C., Foo, C. S., & Zhu, C. (2022). Ma-ganet: A multi-attention generative adversarial network for defocus blur detection. IEEE TIP, 31, 3494–3508.

    Google Scholar 

  • Jiang, B., Zhou, Z., Wang, X., Tang, J., & Luo, B. (2020). cmsalgan: Rgb-d salient object detection with cross-view generative adversarial networks. IEEE TMM, 23, 1343–1353.

    Google Scholar 

  • Jin, W.-D., Jun, X., Han, Q., Zhang, Y., & Cheng, M.-M. (2021). Cdnet: Complementary depth network for rgb-d salient object detection. IEEE TIP, 30, 3376–3390.

    Google Scholar 

  • Ji, G.-P., Zhu, L., Zhuge, M., & Keren, F. (2022). Fast camouflaged object detection via edge-based reversible re-calibration network. Pattern Recognition, 123, 108414.

    Article  Google Scholar 

  • Ju, R., Ge, L., Geng, W., Ren, T., & Wu, G. (2014). Depth saliency based on anisotropic center-surround difference. In ICIP (pp. 1115–1119).

  • Junejo, I. N., & Foroosh, H. (2008). Estimating geo-temporal location of stationary cameras using shadow trajectories. In ECCV (pp. 318–331).

  • Karim, R., Islam, M. A., & Bruce, N. D. B. (2019). Recurrent iterative gating networks for semantic segmentation. In WACV (pp. 1070–1079).

  • Ke, Y. Y., & Tsubono, T. (2022). Recursive contour-saliency blending network for accurate salient object detection. In WACV (pp. 2940–2950).

  • Kim, T., Lee, H., & Kim, D. (2021). Uacanet: Uncertainty augmented context attention for polyp segmentation. In ACM MM (pp. 2167–2175).

  • Kim, J., & Kim, W. (2020). Attentive feedback feature pyramid network for shadow detection. IEEE SPL, 27, 1964–1968.

    Google Scholar 

  • Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization. In ICLR.

  • Le, H., Vicente, T. F. Y., Nguyen, V., Hoai, M., & Samaras, D. (2018). A+ d net: Training a shadow detector with adversarial shadow attenuation. In ECCV (pp. 662–678).

  • Lee, M., Park, C., Cho, S., & Lee, S. (2022). Spsn: Superpixel prototype sampling network for rgb-d salient object detection. In ECCV (pp. 630–647).

  • Lee, C.-Y., Xie, S., Gallagher, P., Zhang, Z., & Tu, Z. (2015). Deeply-supervised nets. In Artificial intelligence and statistics, PMLR (pp. 562–570).

  • Le, T.-N., Nguyen, T. V., Nie, Z., Tran, M.-T., & Sugimoto, A. (2019). Anabranch network for camouflaged object segmentation. CVIU, 184, 45–56.

    Google Scholar 

  • Letian, Yu., Mei, H., Dong, W., Wei, Z., Zhu, L., Wang, Y., & Yang, X. (2022). Progressive glass segmentation. IEEE TIP, 31, 2920–2933.

    Google Scholar 

  • Li, G., & Yu, Y. (2015). Visual saliency based on multiscale deep features. In CVPR (pp. 5455–5463).

  • Li, C., Cong, R., Piao, Y., Xu, Q., & Loy, C. C. (2020b). Rgb-d salient object detection with cross-modality modulation and selection. In ECCV (pp. 225–241).

  • Li, Y., Hou, X., Koch, C., Rehg, J. M., & Yuille, A. L. (2014). The secrets of salient object segmentation. In CVPR (pp. 280–287).

  • Li, G., Liu, Z., Ye, L., Wang, Y., & Ling, H. (2020d). Cross-modal weighting network for rgb-d salient object detection. In ECCV (pp. 665–681).

  • Li, N., Ye, J., Ji, Y., Ling, H., & Yu, J. (2014). Saliency detection on light field. In CVPR (pp. 2806–2813).

  • Li, A., Zhang, J., Lv, Y., Liu, B., Zhang, T., & Dai, Y. (2021a). Uncertainty-aware joint salient object and camouflaged object detection. In CVPR (pp. 10071–10081).

  • Liao, G., Gao, W., Jiang, Q., Wang, R., & Li, G. (2020). Mmnet: Multi-stage and multi-scale fusion network for rgb-d salient object detection. In ACM MM (pp. 2436–2444).

  • Liao, J., Liu, Y., Xing, G., Wei, H., Chen, J., & Xu, S. (2021). Shadow detection via predicting the confidence maps of shadow detection methods. In ACM MM (pp. 704–712).

  • Li, C., Cong, R., Guo, C., Li, H., Zhang, C., Zheng, F., & Zhao, Y. (2020a). A parallel down-up fusion network for salient object detection in optical remote sensing images. Neurocomputing, 415, 411–420.

    Article  Google Scholar 

  • Li, C., Cong, R., Hou, J., Zhang, S., Qian, Y., & Kwong, S. (2019). Nested network with two-stream pyramid for salient object detection in optical remote sensing images. IEEE TGRS, 57, 9156–9166.

    Google Scholar 

  • Li, J., Ji, W., Zhang, M., Piao, Y., Huchuan, L., & Cheng, L. (2023a). Delving into calibrated depth for accurate rgb-d salient object detection. IJCV, 131, 855–876.

    Article  Google Scholar 

  • Li, J., Liang, B., Xiangwei, L., Li, M., Guangming, L., & Yong, X. (2023b). From global to local: Multi-patch and multi-scale contrastive similarity learning for unsupervised defocus blur detection. IEEE TIP, 32, 1158–1169.

    Google Scholar 

  • Li, G., Liu, Z., Chen, M., Bai, Z., Lin, W., & Ling, H. (2021b). Hierarchical alternate interaction network for rgb-d salient object detection. IEEE TIP, 30, 3528–3542.

    Google Scholar 

  • Li, G., Liu, Z., & Ling, H. (2020c). Icnet: Information conversion network for rgb-d based salient object detection. IEEE TIP, 29, 4873–4884.

    Google Scholar 

  • Li, G., Liu, Z., Lin, W., & Ling, H. (2022c). Multi-content complementation network for salient object detection in optical remote sensing images. IEEE TGRS, 60, 1–13.

    Google Scholar 

  • Li, G., Liu, Z., Zhang, X., & Lin, W. (2023). Lightweight salient object detection in optical remote-sensing images via semantic matching and edge alignment. IEEE TGRS, 61, 1–11.

    Google Scholar 

  • Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. (2017). Feature pyramid networks for object detection. In CVPR (pp. 2117–2125).

  • Lin, J., He, Z., & Lau, R. W. H. (2021). Rich context aggregation with reflection prior for glass surface detection. In CVPR (pp. 13415–13424).

  • Lin, J., Wang, G., & Lau, R. W. H. (2020). Progressive mirror detection. In CVPR (pp. 3697–3705).

  • Lin, W., Cao, X., & Foroosh, H. (2010). Camera calibration and geo-location estimation from two shadow trajectories. CVIU, 114, 915–927.

    Google Scholar 

  • Liu, N., & Han, J. (2016). Dhsnet: Deep hierarchical saliency network for salient object detection. In CVPR (pp. 678–686).

  • Liu, N., Han, J., & Yang, M.-H. (2018). Picanet: Learning pixel-wise contextual attention for saliency detection. In CVPR (pp. 3089–3098).

  • Liu, J.-J., Hou, Q., Cheng, M.-M., Feng, J., & Jiang, J. (2019a). A simple pooling-based design for real-time salient object detection. In CVPR (pp. 3917–3926).

  • Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., & Guo, B. (2021c). Swin transformer: Hierarchical vision transformer using shifted windows. In CVPR (pp. 10012–10022).

  • Liu, Z., Wang, Y., Tu, Z., Xiao, Y., & Tang, B. (2021d). Tritransnet: Rgb-d salient object detection with a triplet transformer embedding network. In ACM MM (pp. 4481–4490).

  • Liu, N., Zhang, N., & Han, J. (2020). Learning selective self-mutual attention for rgb-d saliency detection. In CVPR (pp. 13756–13765).

  • Liu, X., Zhang, Y., Cong, R., Zhang, C., Yang, N., Zhang, C., & Zhao, Y. (2021b). Ggrnet: Global graph reasoning network for salient object detection in optical remote sensing images. In PRCV (pp. 584–596).

  • Liu, Z., Zhang, Z., Tan, Y., & Wu, W. (2022b). Boosting camouflaged object detection with dual-task interactive transformer. In ICPR (pp. 140–146).

  • Liu, N., Zhang, N., Wan, K., Shao, L., & Han, J. (2021a). Visual saliency transformer. In ICCV (pp. 4722–4732).

  • Liu, Y., Zhang, Q., Zhang, D., & Han, J. (2019b). Employing deep part-object relationships for salient object detection. In ICCV (pp. 1232–1241).

  • Liu, J.-J., Hou, Q., Liu, Z.-A., & Cheng, M.-M. (2022a). Poolnet+: Exploring the potential of pooling for salient object detection. IEEE TPAMI, 45, 887–904.

    Article  Google Scholar 

  • Liu, T., Yuan, Z., Sun, J., Wang, J., Zheng, N., Tang, X., & Shum, H.-Y. (2010). Learning to detect a salient object. IEEE TPAMI, 33, 353–367.

    Google Scholar 

  • Li, P., Yan, X., Zhu, H., Wei, M., Zhang, X.-P., & Qin, J. (2022). Findnet: Can you find me? Boundary-and-texture enhancement network for camouflaged object detection. IEEE TIP, 31, 6396–6411.

    Google Scholar 

  • Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In CVPR (pp. 3431–3440).

  • Luo, A., Li, X., Yang, F., Jiao, Z., Cheng, H., & Lyu, S. (2020). Cascade graph neural networks for rgb-d salient object detection. In ECCV (pp. 346–364).

  • Lv, Y., Zhang, J., Dai, Y., Li, A., Liu, B., Barnes, N., & Fan, D.-P. (2021). Simultaneously localize, segment and rank the camouflaged objects. In CVPR (pp. 11591–11601).

  • Ma, M., Xia, C., & Li, J. (2021). Pyramidal feature shrinking for salient object detection. In AAAI (pp. 2311–2318).

  • Margolin, R., Zelnik-Manor, L., & Tal, A. (2014). How to evaluate foreground maps? In CVPR (pp. 248–255).

  • Ma, M., Xia, C., Xie, C., Chen, X., & Li, J. (2023). Boosting broader receptive fields for salient object detection. IEEE TIP, 32, 1026–1038.

    Google Scholar 

  • Mehta, A. C. L. S. S., Rastegari, M., & Hajishirzi, H. (2018). Espnet: Efficient spatial pyramid of dilated convolutions for semantic segmentation. In ECCV.

  • Mei, H., Dong, B., Dong, W., Peers, P., Yang, X., Zhang, Q., & Wei, X. (2021a). Depth-aware mirror segmentation. In CVPR (pp. 3044–3053).

  • Mei, H., Ji, G.-P., Wei, Z., Yang, X., Wei, X., & Fan, D.-P. (2021b). Camouflaged object segmentation with distraction mining. In CVPR (pp. 8772–8781).

  • Mei, H., Yang, X., Wang, Y., Liu, Y., He, S., Zhang, Q., Wei, X., & Lau, R. W. H. (2020). Don’t hit me! glass detection in real-world scenes. In CVPR (pp. 3687–3696).

  • Nguyen, T.-C., Nguyen, T.-P., Diep, G.-H., Tran-Dinh, A.-H., Nguyen, T. V., & Tran, M.-T. (2021). Ccbanet: Cascading context and balancing attention for polyp segmentation. In MICCAI (pp. 633–643).

  • Niu, Y., Geng, Y., Li, X., & Liu, F. (2012). Leveraging stereopsis for saliency analysis. In CVPR (pp. 454–461).

  • Pang, Y., Zhang, L., Zhao, X., & Lu, H. (2020a). Hierarchical dynamic filtering network for rgb-d salient object detection. In ECCV (pp. 235–252).

  • Pang, Y., Zhao, X., Xiang, T.-Z., Zhang, L., & Lu, H. (2022). Zoom in and out: A mixed-scale triplet network for camouflaged object detection. In CVPR (pp. 2160–2170).

  • Pang, Y., Zhao, X., Zhang, L., & Lu, H. (2020b). Multi-scale interactive network for salient object detection. In CVPR (pp. 9413–9422).

  • Pang, Y., Zhao, X., Zhang, L., & Huchuan, L. (2023). Caver: Cross-modal view-mixed transformer for bi-modal salient object detection. IEEE TIP, 32, 892–904.

    Google Scholar 

  • Park, H., Yoo, Y., Seo, G., Han, D., Yun, S., & Kwak, N. (2018). C3: Concentrated-comprehensive convolution and its application to semantic segmentation. arXiv preprint arXiv:1812.04920.

  • Pei, J., Cheng, T., Fan, D.-P., Tang, H., Chen, C., & Gool, L. V. (2022). Osformer: One-stage camouflaged instance segmentation with transformers. In ECCV (pp. 19–37).

  • Peng, H., Li, B., Xiong, W., Hu, W., & Ji, R. (2014). Rgbd salient object detection: A benchmark and algorithms. In ECCV (pp. 92–109).

  • Peng, C., Zhang, X., Yu, G., Luo, G., & Sun, J. (2017). Large kernel matters–improve semantic segmentation by global convolutional network. In CVPR (pp. 4353–4361).

  • Perazzi, F., Krähenbühl, P., Pritch, Y., & Hornung, A. (2012). Saliency filters: Contrast based filtering for salient region detection. In CVPR (pp. 733–740).

  • Piao, Y., Ji, W., Li, J., Zhang, M., & Lu, H. (2019). Depth-induced multi-scale recurrent attention network for saliency detection. In ICCV (pp. 7254–7263).

  • Piao, Y., Rong, Z., Zhang, M., Ren, W., & Lu, H. (2020). A2dele: Adaptive and attentive depth distiller for efficient rgb-d salient object detection. In CVPR (pp. 9060–9069).

  • Qin, X., Zhang, Z., Huang, C., Gao, C., Dehghan, M., & Jagersand, M. (2019). Basnet: Boundary-aware salient object detection. In CVPR (pp. 7479–7489).

  • Qin, X., Zhang, Z., Huang, C., Dehghan, M., Zaiane, O. R., & Jagersand, M. (2020). U2-net: Going deeper with nested u-structure for salient object detection. Pattern Recognition, 106, 107404.

    Article  Google Scholar 

  • Ren, J., Hu, X., Zhu, L., Xu, X., Xu, Y., Wang, W., Deng, Z., & Heng, P.-A. (2021). Deep texture-aware features for camouflaged object detection. In IEEE TCSVT.

  • Ren, Z., Gao, S., Chia, L.-T., & Tsang, I.W.-H. (2013). Region-based saliency detection and its application in object recognition. IEEE TCSVT, 24(5), 769–779.

    Google Scholar 

  • Ronneberger, O., Fischer, P., & Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. In MICCAI (pp. 234–241).

  • Rui, Z., Ouyang, W., & Wang, X. (2013). Unsupervised salience learning for person re-identification. In CVPR.

  • Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., & Chen, L.-C. (2018). Mobilenetv2: Inverted residuals and linear bottlenecks. In CVPR (pp. 4510–4520).

  • Shen, Y., Jia, X., & Meng, M. Q.-H. (2021a). Hrenet: A hard region enhancement network for polyp segmentation. In MICCAI (pp. 559–568).

  • Shen, Y., Jia, X., Pan, J., & Meng, M. Q.-H. (2021b). Aprnet: Alternative prediction refinement network for polyp segmentation. In IEEE EMBC (pp. 3114–3117).

  • Shen, Y., Lu, Y., Jia, X., Bai, F., & Meng, M. Q.-H. (2022). Task-relevant feature replenishment for cross-centre polyp segmentation. In MICCAI (pp. 599–608).

  • Shi, J., Xu, L., & Jia, J. (2014). Discriminative blur detection features. In CVPR (pp. 2965–2972).

  • Silva, J., Histace, A., Romain, O., Dray, X., & Granado, B. (2014). Toward embedded detection of polyps in wce images for early diagnosis of colorectal cancer. IJCARS, 9, 283–293.

    Google Scholar 

  • Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.

  • Siris, A., Jiao, J., Tam, G. K. L., Xie, X., & Lau, R. W. H. (2021). Scene context-aware salient object detection. In ICCV (pp. 4156–4166).

  • Skurowski, P., Abdulameer, H., Błaszczyk, J., Depta, T., Kornacki, A., & Kozieł, P. (2018). Animal camouflage analysis: Chameleon database. Unpublished Manuscript.

  • Song, M., Song, W., Yang, G., & Chen, C. (2022). Improving rgb-d salient object detection via modality-aware decoder. IEEE TIP, 31, 6124–6138.

    Google Scholar 

  • Stevens, M., & Merilaita, S. (2009). Animal camouflage: Current issues and new perspectives. Philosophical Transactions of the Royal Society B: Biological Sciences, 364, 423–427.

    Article  Google Scholar 

  • Su, J., Li, J., Zhang, Y., Xia, C., & Tian, Y. (2019). Selectivity or invariance: Boundary-aware salient object detection. In ICCV (pp. 3799–3808).

  • Sun, F., Ren, P., Yin, B., Wang, F., & Li, H. (2023). Catnet: A cascaded and aggregated transformer network for rgb-d salient object detection. IEEE TMM.

  • Sun, Y., Wang, S., Chen, C., & Xiang, T.-Z. (2022). Boundary-guided camouflaged object detection. arXiv preprint arXiv:2207.00794.

  • Sun, P., Zhang, W., Wang, H., Li, S., & Li, X. (2021). Deep rgb-d saliency detection with depth-sensitive attention and automatic multi-modal fusion. In CVPR (pp. 1407–1417).

  • Tajbakhsh, N., Gurudu, S. R., & Liang, J. (2015). Automated polyp detection in colonoscopy videos using shape and context information. IEEE TMI, 35, 630–644.

    Google Scholar 

  • Takahashi, N., & Mitsufuji, Y. (2021). Densely connected multidilated convolutional networks for dense prediction tasks. In CVPR (pp. 993–1002).

  • Tan, M., & Le, Q. (2019). Efficientnet: Rethinking model scaling for convolutional neural networks. In ICML (pp. 6105–6114).

  • Tang, L., Li, B., Zhong, Y., Ding, S., & Song, M. (2021). Disentangled high quality salient object detection. In ICCV (pp. 3580–3590).

  • Tang, C., Liu, X., Zhu, X., Zhu, E., Sun, K., Wang, P., Wang, L., & Zomaya, A. (2020c). R2mrf: Defocus blur detection via recurrently refining multi-scale residual features. In AAAI (pp. 12063–12070).

  • Tang, C., Zhu, X., Liu, X., Wang, L., & Zomaya, A. (2019). Defusionnet: Defocus blur detection via recurrently fusing and refining multi-scale deep features. In CVPR (pp. 2700–2709).

  • Tang, C., Liu, X., An, S., & Wang, P. (2020a). Br 2net: Defocus blur detection via a bidirectional channel attention residual refining network. IEEE TMM, 23, 624–635.

    Google Scholar 

  • Tang, B., Liu, Z., Tan, Y., & He, Q. (2022). Hrtransnet: Hrformer-driven two-modality salient object detection. IEEE TCSVT, 33, 728–742.

    Google Scholar 

  • Tang, C., Liu, X., Zheng, X., Li, W., Xiong, J., Wang, L., Zomaya, A. Y., & Longo, A. (2020b). Defusionnet: Defocus blur detection via recurrently fusing and refining discriminative multi-scale deep features. IEEE TPAMI, 44, 955–968.

    Article  Google Scholar 

  • Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., & Jégou, H. (2021a) Training data-efficient image transformers & distillation through attention. In ICML (pp. 10347–10357).

  • Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., & Jégou, H. (2021b). Training data-efficient image transformers & distillation through attention. In ICML (pp. 10347–10357).

  • Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention is all you need. In NeurIPS (pp. 5998–6008).

  • Vázquez, D., Bernal, J., Sánchez, F. J., Fernández-Esparrach, G., López, A. M., Romero, A., Drozdzal, M., & Courville, A. (2017). A benchmark for endoluminal scene segmentation of colonoscopy images. JHE.

  • Vicente, T. F. Y., Hoai, M., & Samaras, D. (2015). Leave-one-out kernel optimization for shadow detection. In ICCV (pp. 3388–3396).

  • Vicente, T. F. Y., Hou, L., Yu, C.-P., Hoai, M., & Samaras, D. (2016). Large-scale training of shadow detectors with noisily-annotated shadow examples. In ECCV (pp. 816–832).

  • Wang, Z., & Ji, S. (2018). Smoothed dilated convolutions for improved dense prediction. In ACM SIGKDD (pp. 2486–2495).

  • Wang, M., An, X., Li, Y., Li, N., Hang, W., & Liu, G. (2021). Ems-net: Enhanced multi-scale network for polyp segmentation. In IEEE EMBC (pp. 2936–2939).

  • Wang, B., Chen, Q., Zhou, M., Zhang, Z., Jin, X., & Gai, K. (2020). Progressive feature polishing network for salient object detection. In AAAI (pp. 12128–12135).

  • Wang, X., Girshick, R., Gupta, A., & He, K. (2018). Non-local neural networks. In CVPR (pp. 7794–7803).

  • Wang, J., Huang, Q., Tang, F., Meng, J., Su, J., & Song, S. (2022). Stepwise feature fusion: Local guides global. In MICCAI (pp. 110–120).

  • Wang, J., Li, X. & Yang, J. (2018). Stacked conditional generative adversarial networks for jointly learning shadow detection and shadow removal. In CVPR (pp. 1788–1797).

  • Wang, J., Li, X., & Yang, J. (2018). Stacked conditional generative adversarial networks for jointly learning shadow detection and shadow removal. In CVPR (pp. 1788–1797).

  • Wang, L., Lu, H., Wang, Y., Feng, M., Wang, D., Yin, B., & Ruan, X. (2017). Learning to detect salient objects with image-level supervision. In CVPR (pp. 136–145).

  • Wang, W., Shen, J., Cheng, M.-M., & Shao, L. (2019). An iterative and cooperative top-down and bottom-up inference network for salient object detection. In CVPR (pp. 5968–5977).

  • Wang, Y., Wang, R., Fan, X., Wang, T., & He, X. (2023). Pixels, regions, and objects: Multiple enhancement for salient object detection. In CVPR (pp. 10031–10040).

  • Wang, W., Xie, E., Li, X., Fan, D.-P., Song, K., Liang, D., Lu, T., Luo, P., & Shao, L. (2021). Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. In ICCV (pp. 568–578).

  • Wang, T., Zhang, L., Wang, S., Lu, H., Yang, G., Ruan, X., & Borji, A. (2018). Detect globally, refine locally: A novel approach to saliency detection. In CVPR (pp. 3127–3135).

  • Wang, W., Zhao, S., Shen, J., Hoi, S. C. H., & Borji, A. (2019). Salient object detection with pyramid attention and salient edges. In CVPR (pp. 1448–1457).

  • Wang, N., & Gong, X. (2019). Adaptive fusion for rgb-d salient object detection. IEEE Access, 7, 55277–55284.

    Article  Google Scholar 

  • Wang, Q., Liu, Y., Xiong, Z., & Yuan, Y. (2022). Hybrid feature aligned network for salient object detection in optical remote sensing imagery. IEEE TGRS, 60, 1–15.

    Google Scholar 

  • Wang, F., Pan, J., Shoukun, X., & Tang, J. (2022). Learning discriminative cross-modality features for rgb-d saliency detection. IEEE TIP, 31, 1285–1297.

    Google Scholar 

  • Wang, W., Xie, E., Li, X., Fan, D.-P., Song, K., Liang, D., Tong, L., Luo, P., & Shao, L. (2022). Pvt v2: Improved baselines with pyramid vision transformer. Computational Visual Media, 8, 415–424.

    Article  Google Scholar 

  • Wei Ji, Jingjing Li, Miao Zhang, Yongri Piao, and Huchuan Lu. Accurate rgb-d salient object detection via collaborative learning. In ECCV, pages 52–69, 2020.

  • Wei Ji, Jingjing Li, Shuang Yu, Miao Zhang, Yongri Piao, Shunyu Yao, Qi Bi, Kai Ma, Yefeng Zheng, Huchuan Lu, et al. Calibrated rgb-d salient object detection. In CVPR, pages 9471–9481, 2021.

  • Wei, J., Hu, Y., Li, G., Cui, S., Zhou, S. K., & Li, Z. (2022). Boxpolyp: Boost generalized polyp segmentation using extra coarse bounding box annotations. In MICCAI (pp. 67–77).

  • Wei, J., Hu, Y., Zhang, R., Li, Z., Zhou, S. K. & Cui, S. (2021). Shallow attention network for polyp segmentation. In MICCAI (pp. 699–708).

  • Wei, J., Wang, S., & Huang, Q. (2020a) F\(^3\)net: Fusion, feedback and focus for salient object detection. In AAAI (pp. 12321–12328).

  • Wei, J., Wang, S., Wu, Z., Su, C., Huang, Q., & Tian, Q. (2020b). Label decoupling framework for salient object detection. In CVPR (pp. 13025–13034).

  • Wen, H., Yan, C., Zhou, X., Cong, R., Sun, Y., Zheng, B., Zhang, J., Bao, Y., & Ding, G. (2021). Dynamic selective network for rgb-d salient object detection. IEEE TIP, 30, 9179–9192.

    Google Scholar 

  • Wu, R., Feng, M., Guan, W., Wang, D., Lu, H., & Ding, E. (2019a). A mutual learning method for salient object detection with intertwined multi-supervision. In CVPR (pp. 8150–8159).

  • Wu, Z., Paudel, D. P., Fan, D.-P., Wang, J., Wang, S., Demonceaux, C., Timofte, R., & Gool, L. V. (2023). Source-free depth for object pop-out. In ICCV (pp. 1032–1042).

  • Wu, Z., Su, L., & Huang, Q. (2019a). Cascaded partial decoder for fast and accurate salient object detection. In CVPR (pp. 3907–3916).

  • Wu, Z., Su, L., & Huang, Q. (2019b). Stacked cross refinement network for edge-aware salient object detection. In ICCV (pp. 7264–7273).

  • Wu, T., Tang, S., Zhang, R., Cao, J., & Li, J. (2019b). Tree-structured Kronecker convolutional network for semantic segmentation. In ICME (pp. 940–945).

  • Wu, H., Xiao, B., Codella, N., Liu, M., Dai, X., Yuan, L., & Zhang, L. (2021a). Cvt: Introducing convolutions to vision transformers. In ICCV (pp. 22–31).

  • Wu, H., Zhong, J., Wang, W., Wen, Z., & Qin, J. (2021b). Precise yet efficient semantic calibration and refinement in convnets for real-time polyp segmentation from colonoscopy videos. In AAAI (pp. 2916–2924).

  • Wu, Y.-H., Liu, Y., Xu, J., Bian, J.-W., Gu, Y.-C., & Cheng, M.-M. (2021). Mobilesal: Extremely efficient rgb-d salient object detection. IEEE TPAMI, 44, 10261–10269.

    Article  Google Scholar 

  • Wu, Y.-H., Liu, Y., Zhang, L., Cheng, M.-M., & Ren, B. (2022). Edn: Salient object detection via extremely-downsampled network. IEEE TIP, 31, 3125–3136.

    Google Scholar 

  • Xie, S., Girshick, R., Dollár, P., Tu, Z., & He, K. (2017). Aggregated residual transformations for deep neural networks. In CVPR (pp. 1492–1500).

  • Xie, E., Wang, W., Wang, W., Ding, M., Shen, C., & Luo, P. (2020). Segmenting transparent objects in the wild. In ECCV (pp. 696–711).

  • Xie, E., Wang, W., Wang, W., Sun, P., Xu, H., Liang, D., & Luo, P. (2021). Segmenting transparent objects in the wild with transformer. In IJCAI (pp. 1194–1200).

  • Xie, E., Wang, W., Yu, Z., Anandkumar, A., Alvarez, J. M., & Luo, P. (2021). Segformer: Simple and efficient design for semantic segmentation with transformers. In NeurIPS (pp. 12077–12090).

  • Xie, C., Xia, C., Ma, M., Zhao, Z., Chen, X., & Li, J. (2022). Pyramid grafting network for one-stage high resolution saliency detection. In CVPR (pp. 11717–11726).

  • Xu, B., Liang, H., Liang, R., & Chen, P. (2021). Locate globally, segment locally: A progressive architecture with knowledge review network for salient object detection. In AAAI (pp. 3004–3012).

  • Xu, Y., Xu, D., Hong, X., Ouyang, W., Ji, R., Xu, M., & Zhao, G. (2019). Structured modeling of joint deep feature and prediction refinement for salient object detection. In ICCV (pp. 3789–3798).

  • Yan, Q., Xu, L., Shi, J., & Jia, J. (2013). Hierarchical saliency detection. In CVPR (pp. 1155–1162).

  • Yang, X., Mei, H., Xu, K., Wei, X., Yin, B., & Lau, R. W. H. (2019). Where is my mirror? In ICCV (pp. 8809–8818).

  • Yang, H., Wang, T., Hu, X., & Fu, C.-W. (2023). Silt: Shadow-aware iterative label tuning for learning to detect shadows from noisy labels. In ICCV (pp. 12687–12698).

  • Yang, M., Yu, K., Zhang, C., Li, Z., & Yang, K. (2018). Denseaspp for semantic segmentation in street scenes. In CVPR (pp. 3684–3692).

  • Yang, F., Zhai, Q., Li, X., Huang, R., Luo, A., Cheng, H., & Fan, D.-P. (2021). Uncertainty-guided transformer reasoning for camouflaged object detection. In ICCV (pp. 4146–4155).

  • Yang, C., Zhang, L., Lu, H., Ruan, X., & Yang, M.-H. (2013). Saliency detection via graph-based manifold ranking. In CVPR (pp. 3166–3173).

  • Yang, G. R., Murray, J. D., & Wang, X.-J. (2016). A dendritic disinhibitory circuit mechanism for pathway-specific gating. Nature Communications, 7, 12815.

    Article  Google Scholar 

  • Yan, J., Le, T.-N., Nguyen, K.-D., Tran, M.-T., Do, T.-T., & Nguyen, T. V. (2021). Mirrornet: Bio-inspired camouflaged object segmentation. IEEE Access, 9, 43290–43300.

    Article  Google Scholar 

  • Yuan, L., Chen, Y., Wang, T., Yu, W., Shi, Y., Jiang, Z.-H., Tay, F. E. H., Feng, J., & Yan, S. (2021). Tokens-to-token vit: Training vision transformers from scratch on imagenet. In ICCV (pp. 558–567).

  • Zeng, Y., Zhang, P., Zhang, J., Lin, Z., & Lu, H. (2019). Towards high-resolution salient object detection. In ICCV (pp. 7234–7243).

  • Zhai, Q., Li, X., Yang, F., Chen, C., Cheng, H., & Fan, D.-P. (2021). Mutual graph learning for camouflaged object detection. In CVPR (pp. 12997–13007).

  • Zhai, Q., Li, X., Yang, F., Jiao, Z., Luo, P., Cheng, H., & Liu, Z. (2022). Mgl: Mutual graph learning for camouflaged object detection. IEEE TIP, 32, 1897–1910.

    Google Scholar 

  • Zhang, C., Cong, R., Lin, Q., Ma, L., Li, F., Zhao, Y., & Kwong, S. (2021). Cross-modality discrepant interaction network for rgb-d salient object detection. In ACM MM (pp. 2094–2102).

  • Zhang, L., Dai, J., Lu, H., He, Y., & Wang, G. (2018b). A bi-directional message passing model for salient object detection. In CVPR (pp. 1741–1750).

  • Zhang, H., Dana, K., Shi, J., Zhang, Z., Wang, X., Tyagi, A., & Agrawal, A. (2018a). Context encoding for semantic segmentation. In CVPR (pp. 7151–7160).

  • Zhang, J., Fan, D.-P., Dai, Y., Anwar, S., Saleh, F. S., Zhang, T., & Barnes, N. (2020). Uc-net: Uncertainty inspired rgb-d saliency detection via conditional variational autoencoders. In CVPR (pp. 8582–8591).

  • Zhang, J., Fan, D.-P., Dai, Y., Yu, X., Zhong, Y., Barnes, N., & Shao, L. (2021). Rgb-d saliency detection via cascaded mutual information minimization. In ICCV (pp. 4338–4347).

  • Zhang, M., Fei, S. X., Liu, J., Xu, S., Piao, Y., & Lu, H. (2020). Asymmetric two-stream architecture for accurate rgb-d saliency detection. In ECCV (pp. 374–390).

  • Zhang, W., Ji, G.-P., Wang, Z., Fu, K., & Zhao, Q. (2021). Depth quality-inspired feature manipulation for efficient rgb-d salient object detection. In ACM MM (pp. 731–740).

  • Zhang, R., Lai, P., Wan, X., Fan, D.-J., Gao, F., Wu, X.-J., & Li, G. (2022). Lesion-aware dynamic kernel for polyp segmentation. In MICCAI (pp. 99–109).

  • Zhang, R., Li, G., Li, Z., Cui, S., Qian, D., & Yu, Y. (2020). Adaptive context selection for polyp segmentation. In MICCAI (pp. 253–262).

  • Zhang, Y., Liu, H., & Hu, Q. (2021). Transfuse: Fusing transformers and cnns for medical image segmentation. In MICCAI (pp. 14–24).

  • Zhang, P., Liu, W., Lu, H., & Shen, C. (2018). Salient object detection by lossless feature reflection. In IJCAI (pp. 1149-1155).

  • Zhang, M., Liu, T., Piao, Y., Yao, S., & Lu, H. (2021). Auto-msfnet: Search multi-scale fusion network for salient object detection. In ACM MM (pp. 667–676).

  • Zhang, M., Ren, W., Piao, Y., Rong, Z., & Lu, H. (2020). Select, supplement and focus for rgb-d saliency detection. In CVPR (pp. 3472–3481).

  • Zhang, P., Wang, D., Lu, H., Wang, H., & Ruan, X. (2017). Amulet: Aggregating multi-level convolutional features for salient object detection. In ICCV (pp. 202–211).

  • Zhang, X., Wang, T., Qi, J., Lu, H., & Wang, G. (2018). Progressive attention guided recurrent network for salient object detection. In CVPR (pp. 714–722)

  • Zhang, M., Xu, S., Piao, Y., Shi, D., Lin, S., & Lu, H. (2022). Preynet: Preying on camouflaged objects. In ACM MM (pp. 5323–5332).

  • Zhang, M., Yao, S., Hu, B., Piao, Y., & Ji, W. (2020). C2dfnet: Criss-cross dynamic filter network for rgb-d salient object detection. IEEE TMM.

  • Zhang, L., Zhang, J., Lin, Z., Lu, H., & You He. (2019). Capsal: Leveraging captioning to boost semantics for salient object detection. In CVPR (pp. 6024–6033).

  • Zhang, M., Zhang, Y., Piao, Y., Hu, B., & Lu, H. (2020). Feature reintegration over differential treatment: A top-down and adaptive fusion network for rgb-d salient object detection. In ACM MM (pp. 4107–4115).

  • Zhang, W., Zheng, L., Wang, H., Wu, X., & Li, X. (2022). Saliency hierarchy modeling via generative kernels for salient object detection. In ECCV (pp. 570–587).

  • Zhang, Q., Cong, R., Li, C., Cheng, M.-M., Fang, Y., Cao, X., Zhao, Y., & Kwong, S. (2020). Dense attention fluid network for salient object detection in optical remote sensing images. IEEE TIP, 30, 1305–1317.

    Google Scholar 

  • Zhao, T., & Wu, X. (2019). Pyramid feature attention network for saliency detection. In CVPR (pp. 3085–3094).

  • Zhao, J.-X., Cao, Y., Fan, D.-P., Cheng, M.-M., Li, X.-Y., & Zhang, L. (2019). Contrast prior and fluid pyramid integration for rgbd salient object detection. In CVPR (pp. 3922–3931).

  • Zhao, J.-X., Liu, J.-J., Fan, D.-P., Cao, Y., Yang, J., & Cheng, M.-M. (2019). Egnet: Edge guidance network for salient object detection. In ICCV (pp. 8779–8788).

  • Zhao, F., Lu, H., Zhao, W., & Yao, L. (2021). Image-scale-symmetric cooperative network for defocus blur detection. IEEE TCSVT.

  • Zhao, X., Pang, Y., Zhang, L., Lu, H., & Zhang, L. (2020). Suppress and balance: A simple gated network for salient object detection. In ECCV (pp. 35–51).

  • Zhao, W., Shang, C., & Lu, H. (2021). Self-generated defocus blur detection via dual adversarial discriminators. In CVPR (pp. 6933–6942).

  • Zhao, H., Shi, J., Qi, X., Wang, X., & Jia, J. (2017). Pyramid scene parsing network. In CVPR (pp. 2881–2890).

  • Zhao, H., Shi, J., Qi, X., Wang, X., & Jia, J. (2017). Pyramid scene parsing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 2881–2890).

  • Zhao, W., Wei, F., He, Y., & Lu, H. (2022). United defocus blur detection and deblurring via adversarial promoting learning. In ECCV (pp. 569–586).

  • Zhao, W., Wei, F., Wang, H., He, Y., & Lu, H. (2023). Full-scene defocus blur detection with defbd+ via multi-level distillation learning. IEEE TMM.

  • Zhao, Z., Xia, C., Xie, C., & Li, J. (2021). Complementary trilateral decoder for fast and accurate salient object detection. In ACM MM (pp. 4967–4975).

  • Zhao, X., Zhang, L., & Lu, H. (2021). Automatic polyp segmentation via multi-scale subtraction network. In MICCAI (pp. 120–130).

  • Zhao, X., Zhang, L., Pang, Y., Lu, H., & Zhang, L. (2020). A single stream network for robust and real-time rgb-d salient object detection. In ECCV (pp. 646–662).

  • Zhao, J., Zhao, Y., Li, J., & Chen, X. (2020). Is depth really necessary for salient object detection? In ACM MM (pp. 1745–1754).

  • Zhao, W., Zhao, F., Wang, D., & Lu, H. (2018). Defocus blur detection via multi-stream bottom-top-bottom fully convolutional network. In CVPR (pp. 3080–3088).

  • Zhao, W., Zheng, B., Lin, Q., & Lu, H. (2019). Enhancing diversity of defocus blur detectors via cross-ensemble network. In CVPR (pp. 8905–8913).

  • Zhao, W., Hou, X., He, Y., & Huchuan, L. (2021). Defocus blur detection via boosting diversity of deep ensemble networks. IEEE TIP, 30, 5426–5438.

    Google Scholar 

  • Zhao, Y., Zhao, J., Li, J., & Chen, X. (2021). Rgb-d salient object detection with ubiquitous target awareness. IEEE TIP, 30, 7717–7731.

    Google Scholar 

  • Zheng, S., Lu, J., Zhao, H., Zhu, X., Luo, Z., Wang, Y., Fu, Y., Feng, J., Xiang, T., Torr, P. H. S. et al. (2021). Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In CVPR (pp. 6881–6890).

  • Zheng, Q., Qiao, X., Cao, Y., & Lau, R. W. H. (2019). Distraction-aware shadow detection. In CVPR (pp. 5167–5176).

  • Zheng, J., Quan, Y., Zheng, H., Wang, Y., & Pan, X. (2023). Orsi salient object detection via cross-scale interaction and enlarged receptive field. IEEE GRSL, 20, 1–5.

    Google Scholar 

  • Zhengzheng, T., Wang, C., Li, C., Fan, M., Zhao, H., & Luo, B. (2022). Orsi salient object detection via multiscale joint region and boundary model. IEEE TGRS, 60, 1–13.

    Google Scholar 

  • Zhong, Y., Li, B., Tang, L., Kuang, S., Wu, S., & Ding, S. (2022). Detecting camouflaged object in frequency domain. In CVPR (pp. 4504–4513).

  • Zhou, T., Fu, H., Chen, G., Zhou, Y., Fan, D.-P., & Shao, L. (2021). Specificity-preserving rgb-d saliency detection. In ICCV (pp. 4681–4691).

  • Zhou, J., Wang, L., Lu, H., Huang, K., Shi, X., & Liu, B. (2022). Mvsalnet: Multi-view augmentation for rgb-d salient object detection. In ECCV (pp. 270–287).

  • Zhou, Z., Wang, Z., Lu, H., Wang, S., & Sun, M. (2020). Multi-type self-attention guided degraded saliency detection. In AAAI (pp. 13082–13089).

  • Zhou, H., Xie, X., Lai, J.-H., Chen, Z., & Yang, L. (2020). Interactive two-stream decoder for accurate and fast saliency detection. In CVPR (pp. 9141–9150).

  • Zhou, X., Shen, K., Liu, Z., Gong, C., Zhang, J., & Yan, C. C. (2022). Edge-aware multiscale feature integration network for salient object detection in optical remote sensing images. IEEE TGRS, 60, 1–15.

    Google Scholar 

  • Zhou, T., Zhou, Y., Gong, C., Yang, J., & Zhang, Yu. (2022). Feature aggregation and propagation network for camouflaged object detection. IEEE TIP, 31, 7036–7047.

    Google Scholar 

  • Zhou, T., Zhou, Y., He, K., Gong, C., Yang, J., Huazhu, F., & Shen, D. (2023). Cross-level feature aggregation network for polyp segmentation. Pattern Recognition, 140, 109555.

    Article  Google Scholar 

  • Zhou, Z., Siddiquee, M. M. R., Tajbakhsh, N., & Liang, J. (2019). Unet++: Redesigning skip connections to exploit multiscale features in image segmentation. IEEE TMI, 39, 1856–1867.

    Google Scholar 

  • Zhu, C., & Li, G. (2017). A three-pathway psychobiological framework of salient object detection using stereoscopic technology. In ICCVW (pp. 3008–3014).

  • Zhu, C., Cai, X., Huang, K., Li, T. H., & Li, G. (2019). Pdnet: Prior-model guided depth-enhanced network for salient object detection. In ICME (pp. 199–204).

  • Zhu, L., Deng, Z., Hu, X., Fu, C.-W., Xu, X., Qin, J., & Heng, P.-A. (2018). Bidirectional feature pyramid network with recurrent attention residual modules for shadow detection. In ECCV (pp. 121–136).

  • Zhu, Y., Qiu, J., & Ren, B. (2021b). Transfusion: A novel slam method focused on transparent objects. In ICCV (pp. 6019–6028).

  • Zhu, J., Samuel, K. G. G., Masood, S. Z., & Tappen, M. F. (2010). Learning to recognize shadows in monochromatic natural images. In CVPR (pp. 223–230).

  • Zhu, L., Xu, K., Ke, Z., & Lau, R. W. H. (2021a). Mitigating intensity bias in shadow detection via feature decomposition and reweighting. In ICCV (pp. 4702–4711).

  • Zhuang, B., Liu, J., Pan, Z., He, H., Weng, Y., & Shen, C. (2023). A survey on efficient training of transformers. arXiv preprint arXiv:2302.01107.

  • Zhuge, Y., Zeng, Y., & Lu, H. (2019). Deep embedding features for salient object detection. In AAAI (pp. 9340–9347).

  • Ziegler, T., Fritsche, M., Kuhn, L., & Donhauser, K. (2019). Efficient smoothing of dilated convolutions for image segmentation. arXiv preprint arXiv:1903.07992.

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grant 62276046, and in part by the Dalian Science and Technology Innovation Foundation under Grant 2023JJ12GX015.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lihe Zhang.

Additional information

Communicated by Karteek Alahari.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: Qualitative Evaluation

Appendix A: Qualitative Evaluation

Figures 15, 16, 17, 18, 19, 20, 21, 22, 23 and 24 illustrate some visual comparisons on each sub-task. We summarize the advantages of the GateNet compared to others when facing some challenges: I) Interference produced by complex. In camouflaged object detection and poly segmentation tasks, foreground objects usually share the similar appearance to the background, which can easily deceive predictors. But the GateNet can accurately capture the hidden objects and separate them from the surrounding environment (see the Figs. 23 and 24). The gated mechanism also plays an important role in RGB -D salient object detection. As shown in Fig. 17, the proposed two-stream GateNet can effectively utilize the guidance information provided by the high-quality depth map while suppressing the interference information from the low-quality depth map, thereby identifying the whole object precisely. II) Interference produced by adjacent objects. In the real world, shadows often exist on the ground or desktop, and are closely adjacent to the original object. This characteristic requires shadow detection networks to have the ability to distinguish between adjacent objects. As shown in Fig. 19, most methods are disturbed by the surface or the original object, but our method can focus on the shadow regions. III) The foreground exists multiple or small objects. On the one hand, glass-like objects are often present in groups in the real world, which poses a serious challenge to the perception capability of the network for the multiple objects. On the other hand, small objects usually appear in remote sensing images. Benefiting from the Fold-ASPP, both multiple and small objects can be localized accurately. Figures 18 and 21 show that our method can accurately distinguish each independent connected region without sticking to each other. GateNet is the only one can provide clean prediction maps and maintain the basic shape of the aircraft (see the 6th–8th columns in Fig. 16). IV) Boundary and details. Our GateNet has a mix feature aggregation decoder that a parallel branch by concatenating the output of the progressive branch and the features of the gated encoder, so that the residual information complementary to the progressive branch is supplemented to generate the final prediction. In this way, the prediction can restore more details, therefore, the limbs and even tentacles of the insects are retained well (see the 3th and 8th columns in Fig. 23). V) Regional consistency. In defocus blur detection task, the focused area usually has incomplete semantic information because the blurred region may also belong to the semantic part of the foreground. Benefiting from the folded operation, our model can obtain more stable structural features to improve the intra-class consistency. From the results in Fig. 20, it can be observed that our method can segment the foreground well while the other methods more or less lose similar areas inside or around focused regions.

Fig. 15
figure 15

Visual comparison between our GateNet results and the state-of-the-art methods [CTDNet (Zhao et al., 2021), VST (Liu et al., 2021a), LDF (Wei et al., 2020b), Auto-MSF (Zhang et al., 2021), KRN (xu et al., 2021), MINet (Pang et al., 2020b), ITSD (Zhou et al., 2020), F3Net (Wei et al., 2020a)] on RGB SOD datasets

Fig. 16
figure 16

Visual comparison between our GateNet results and the state-of-the-art methods [RRNet (Cong et al., 2022b), MJRBM (Tu et al., 2022), DAFNet (Zhang et al., 2020)] on ORSI SOD datasets

Fig. 17
figure 17

Visual comparison between our GateNet results and the state-of-the-art methods [TriTransNet (Liu et al., 2021d), SPNet (Zhou et al., 2021), DSNet (Wen et al., 2021), UTA (Zhao et al., 2021), RD3D (Chen et al., 2021), DCF (Ji et al., 2021)] on RGB-D SOD datasets

Fig. 18
figure 18

Visual comparison between our GateNet results and the state-of-the-art methods [EBLNet (He et al., 2021), GDNet (Mei et al., 2020)] on Glass Object Detection datasets

Fig. 19
figure 19

Visual comparison between our GateNet results and the state-of-the-art methods [DSD (Zheng et al., 2019), BDRAR (Zhu et al., 2018), ADNet (Le et al., 2018), DSC (Hu et al., 2018)] on Shadow Detection datasets

Fig. 20
figure 20

Visual comparison between our GateNet results and the state-of-the-art methods [DENets (Zhao et al., 2021), IS2CNet (Zhao et al., 2021), SG (Zhao et al., 2021), Depth-Distill (Cun & Pun, 2020), CENet (Zhao et al., 2019)] on Defocus Blur Detection datasets

Fig. 21
figure 21

Visual comparison between our GateNet results and the state-of-the-art method [Translab (Xie et al., 2020)] on Transparent Object Detection datasets

Fig. 22
figure 22

Visual comparison between our GateNet results and the state-of-the-art method [MirrorNet (Yang et al., 2019)] on Mirror Detection datasets

Fig. 23
figure 23

Visual comparison between our GateNet results and the state-of-the-art methods [UGTR (Yang et al., 2021), IS2CNet (Zhai et al., 2021), RankNet (Lv et al., 2021), PFNet (Mei et al., 2021b), SINet (Fan et al., 2020a)] on Camouflaged Object Detection datasets

Fig. 24
figure 24

Visual comparison between our GateNet results and the state-of-the-art methods [UACA (Kim et al., 2021), MSNet (Zhao et al., 2021), SANet (Wei et al., 2021), PraNet (Fan et al., 2020b), SFA (Fang et al., 2019), UNet++ (Zhou et al., 2019), UNet (Ronneberger et al., 2015)] on Polyp Segmentation datasets

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhao, X., Pang, Y., Zhang, L. et al. Towards Diverse Binary Segmentation via a Simple yet General Gated Network. Int J Comput Vis (2024). https://doi.org/10.1007/s11263-024-02058-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11263-024-02058-y

Keywords

Navigation