Spatial-channel transformer network based on mask-RCNN for efficient mushroom instance segmentation

Jiaoling Wang, Weidong Song, Wengang Zheng, Qingchun Feng, Mingfei Wang, Chunjiang Zhao

Abstract


Edible mushrooms are rich in nutrients; however, harvesting mainly relies on manual labor. Coarse localization of each mushroom is necessary to enable a robotic arm to accurately pick edible mushrooms. Previous studies used detection algorithms that did not consider mushroom pixel-level information. When these algorithms are combined with a depth map, the information is lost. Moreover, in instance segmentation algorithms, convolutional neural network (CNN)-based methods are lightweight, and the extracted features are not correlated. To guarantee real-time location detection and improve the accuracy of mushroom segmentation, this study proposed a new spatial-channel transformer network model based on Mask-CNN (SCT-Mask-RCNN). The fusion of Mask-RCNN with the self-attention mechanism extracts the global correlation outcomes of image features from the channel and spatial dimensions. Subsequently, Mask-RCNN was used to maintain a lightweight structure and extract local features using a spatial pooling pyramidal structure to achieve multiscale local feature fusion and improve detection accuracy. The results showed that the SCT-Mask-RCNN method achieved a segmentation accuracy of 0.750 on segm_Precision_mAP and detection accuracy of 0.638 on Bbox_Precision_mAP. Compared to existing methods, the proposed method improved the accuracy of the evaluation metrics Bbox_Precision_mAP and segm_Precision_mAP by over 2% and 5%, respectively.
Key words: edible mushrooms; picking; instance segmentation; deep learning; algorithm
DOI: 10.25165/j.ijabe.20241704.8987

Citation: Wang J L, Song W D, Zheng W G, Feng Q C, Wang M F, Zhao C J. Spatial-channel transformer network based on mask-RCNN for efficient mushroom instance segmentation. Int J Agric & Biol Eng, 2024; 17(4): 227–235.

Keywords


edible mushrooms; picking; instance segmentation; deep learning; algorithm

Full Text:

PDF

References


Wang M, Zhao R. A review on nutritional advantages of edible mushrooms and its industrialization development situation in protein meat analogues. Journal of Future Foods, 2023; 3(1): 1–7.

Li C, Xu S. Edible mushroom industry in China: Current state and perspectives. Applied Microbiology and Biotechnology, 2022; 106(11): 3949–3955.

Retsinas G, Efthymiou N, Anagnostopoulou D, Maragos P. Mushroom detection and three dimensional pose estimation from multi-view point clouds. Sensors, 2023; 23(7): 3576.

Hua X, Li H, Zeng J, Han C, Chen T, Tang L, et al. A review of target recognition technology for fruit picking robots: from digital image processing to deep learning. Applied Sciences, 2023; 13(7): 4160.

Qi X, Dong J, Lan Y, Zhu H. Method for identifying litchi picking position based on YOLOv5 and PSPNet. Remote Sensing, 2022; 14(9): 2004.

Dean Z, Liu X Y, Chen Y, Jin J, Jia W K, Hu C L. Image recognition at night for apple picking robot. Transactions of the CSAM, 2015; 46(3): 15–22.

Xu C, Lu Y, Jiang H, Liu S, Ma Y, Zhao T. Counting crowded soybean pods based on deformable attention recursive feature pyramid. Agronomy, 2023; 13(6): 1507.

Yang C H, Xiong L Y, Wang Z, Wang Y, Shi G, Kuremot T, et al. Integrated detection of citrus fruits and branches using a convolutional neural network. Comput Electron in Agric, 2020; 174: 105469.

Chen W, Lu S, Liu B, Li G, Qian T. Detecting citrus in orchard environment by using improved YOLOv4. Scientific Programming. 2020; 2020: 1–3.

Chen P, Li W, Yao S, Ma C, Zhang J, Wang B, et al. Recognition and counting of wheat mites in wheat fields by a three-step deep learning method. Neurocomputing, 2021; 437: 21–30.

Li R, Wang R J, Zhang J, Xie C J, Liu L, Wang F Y, et al. An effective data augmentation strategy for CNN-based pest localization and recognition in the field. IEEE Access, 2019; 7: 160274–160283.

Liu T, Chen W, Wu W, Sun C M, Guo W S, Zhu X K. Detection of aphids in wheat fields using a computer vision technique. Biosystems Engineering, 2016; 141: 82–93.

He K, Gkioxari G, Dollár P, Girshick R. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, 2017; pp.2961-2969.

Huang Z J, Huang L C, Gong Y C, Huang C, Wang X G. Mask scoring R-CNN. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019; pp.6409-6418.

Sun C Z, Hu X M, Yu T. Structural design of agaricus bisporus picking robot based on cartesian coordinate system. Electrical Engineering and Computer Science (EECS), 2019; 2: 103–106.

Kirillov A, Mintun E, Ravi N, Mao H, Rolland C, Gustafson L, Xiao T, Whitehead S, Berg AC, Lo W Y, Dollár P. Segment anything. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023; pp.4015–4026.

Cai Z Y, Jian Y, Zhang Z Y, Jin C Q, Da F P. SST-ReversibleNet: Reversible-prior-based spectral-spatial transformer for efficient hyperspectral image reconstruction. Arxiv preprint, 2023; arxiv: 2305.04054.

Cai Z Y, Li C Y, Yu Y, Jin C Q, Da F P. Momentum accelerated unfolding network with spectral-spatial prior for computational spectral imaging. Applied Soft Computing, 2024; Feb 21: 111420.

Chen K, Pang J M, Wang J Q, Xiong Y, Li X X, Sun S Y, et al. Hybrid task cascade for instance segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition 2019; pp.4974–4983.

Yang S Z, Huang J, Yu X Y, Yu T. Research on a segmentation and location algorithm based on mask RCNN for agaricus bisporus. In 2022 2nd International Conference on Computer Science, Electronic Information Engineering and Intelligent Control Technology (CEI), IEEE, 2022; pp.717–721.

Cong P C, Feng H, Lv K F, Zhou J C, Li S D. MYOLO: a lightweight fresh shiitake mushroom detection model based on YOLOv3. Agriculture, 2023; 13(2): 392.

Hafiz A M, Bhat G M. A survey on instance segmentation: state of the art. International Journal of Multimedia Information Retrieval, 2020; 9(3): 171–89.

Romera-Paredes B, Torr P H. Recurrent instance segmentation. In Proceedings of 14th European Conference on Computer Vision–ECCV 2016, Amsterdam, The Netherlands, 2016; pp.312–329.

Arnab A, Torr PH. Pixelwise instance segmentation with a dynamically instantiated network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017; pp.441–450.

Lee Y, Park J. Centermask: Real-time anchor-free instance segmentation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020; pp.13906–13915.

Cai Z W, Vasconcelos N. Cascade R-CNN: High quality object detection and instance segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019; 43(5): 1483–1498.

Bolya D, Zhou C, Xiao F, Lee Y J. Yolact: Real-time instance segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019; pp.9157–9166.

Chen H, Sun K Y, Tian Z, Shen C H, Huang Y M, Yan Y L. Blendmask: Top-down meets bottom-up for instance segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020; pp.8573–8581.

Ying H, Huang Z, Liu S, Shao T J, Zhou K. Embedmask: Embedding coupling for one-stage instance segmentation. Arxiv preprint, 2019; arxiv: 1912.01954.

Wang X L, Zhang R F, Kong T, Li L, Shen C H. Solov2: Dynamic and fast instance segmentation. Advances in Neural information Processing Systems, 2020; 33: 17721–17732.

Shojaiee F, Baleghi Y. EFASPP U-Net for semantic segmentation of night traffic scenes using fusion of visible and thermal images. Engineering Applications of Artificial Intelligence, 2023; 117: 105627.

Kaur A, Goyal P, Rajhans R, Agarwal L, Goyal N. Fusion of multivariate time series meteorological and static soil data for multistage crop yield prediction using multi-head self-attention network. Expert Systems with Applications, 2023; 226: 120098.

Yang Q L, Ye Y, Gu L C, Wu Y T. MSFCA-net: A multi-scale feature convolutional attention network for segmenting crops and weeds in the field. Agriculture, 2023; 13(6): 1176.

Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, et al. Attention is all you need. Advances in Neural Information Processing Systems, 2017; 30: 1–11.

Gillioz A, Casas J, Mugellini E, Abou Khaled O. Overview of the Transformer-based Models for NLP Tasks. In 15th Conference on Computer Science and Information Systems (FedCSIS), IEEE, 2020; pp.179–183.

Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, et al. An image is worth 16x16 words: Transformers for image recognition at scale. arxiv preprint arxiv: 2010.11929. 2020 Oct 22.

Liu Z, Lin Y T, Cao Y, Hu H, Wei Y X, Zhang Z, et al. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021; pp.10012–10022.

Bao W X, Xie W J, Hu G S, Yang X J, Su B B. Wheat ear counting method in UAV images based on TPH-YOLO. Transactions of the CSAE, 2023; 39(1): 155–161. (in Chinese)

Xu Y L, Kong S L, Chen Q Y, Gao Z Y, Li C X. Model for identifying strong generalization apple leaf disease using transformer. Transactions of the CSAE, 2022; 38(16): 198–206. (in Chinese)

Wang C, Wu X H, Zhang Y Q, Wang W J. Recognizing weeds in maize fields using shifted window Transformer network. Transactions of the CSAE, 2022; 38(15): 133–42. (in Chinese)

Fu L L, Huang H, Wang H, Huang S C, Chen D. Classification of maize growth stages using the Swin transformer model. Transactions of the CSAE, 2022; 38(14): 191–200.

Zhu D L, Yu M S, Liang M F. Real-time instance segmentation of maize ears using SwinT-YOLACT. Transactions of the CSAE, 2023; 39(14): 164–172. (in Chinese)

Liu X, Yi S, Li L, Cheng X H, Wang C. Semantic segmentation of terrace image regions based on lightweight CNN-transformer hybrid networks. Transactions of the CSAE, 2023; 39(13): 171–181. (in Chinese)

Fang Y X, Yang S S, Wang X G, Li Y, Fang C, Shan Y, et al. Instances as queries. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021; pp.6910–6919.

Kirillov A, Wu Y, He K, Girshick R. Pointrend: Image segmentation as rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020; pp.9799–9808.

Cai Z Y, Jin C, Da F. DMDC: Dynamic-mask-based dual camera design for snapshot Hyperspectral Imaging. arxiv preprint, 2023; arxiv: 2308.01541.




Copyright (c) 2024 International Journal of Agricultural and Biological Engineering

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

2023-2026 Copyright IJABE Editing and Publishing Office