Citrus detection algorithm in natural environment based on Dense-TRU-YOLO

Taixiong Zheng, Yilin Zhu, Siyu Liu, Yongfu Li, Mingzhe Jiang

Abstract


Accurate detection of citrus in the natural orchard is crucial for citrus-picking robots. However, it has become a challenging task due to the influence of illumination, severe shading of branches and leaves, as well as overlapping of citrus. To this end, a Dense-TRU-YOLO model was proposed, which integrated the Denseblock with the Transformer and used UNet++network as the neck structure. First of all, the Denseblock structure was incorporated into YOLOv5, which added
shallow semantic information to the deep part of the network and improved the flow of information and gradients. Secondly,
the deepest Cross Stage Partial Connections (CSP) bottleneck with the 3 convolutions module of the backbone was replaced by the CSP Transformer with 3 convolutions module, which increased the semantic resolution and improved the detection accuracy of occlusion. Finally, the neck of the original network was replaced by the combined structure of UNet++ feature pyramid networks (UNet++-FPN), which not only added cross-weighted links between nodes with the same size but also enhanced the feature fusion ability between nodes with different sizes, making the regression of the network to the target
boundary more accurate. Ablation experiments and comparison experiments showed that the Dense-TRU-YOLO can effectively improve the detection accuracy of citrus under severe occlusion and overlap. The overall accuracy, recall, mAP@0.5, and F1 were 90.8%, 87.6%, 90.5%, and 87.9%, respectively. The precision of Dense-TRU-YOLO was the highest, which was 3.9%, 6.45%, 1.9%, 7.4%, 3.3%, 4.9%, and 9.9% higher than that of the YOLOv5-s, YOLOv3, YOLOv5-n, YOLOv4-tiny, YOLOv4, YOLOX, and YOLOF, respectively. In addition, the reasoning speed was 9.2 ms, 1.7 ms, 10.5 ms, and 2.3 ms faster than that of YOLOv3, YOLOv5-n, YOLOv4, and YOLOX. Dense TRU-YOLO is designed to enhance the accuracy of fruit recognition in natural settings and boost the detection capabilities for small targets at extended ranges.
Keywords: citrus, picking robot, Dense-TRU-YOLO, Denseblock, UNet++-FPN
DOI: 10.25165/j.ijabe.20251801.8866

Citation: Zheng T X, Zhu Y L, Liu S Y, Li Y F, Jiang M Z. Detection of citrus in the natural environment using Dense-TRUYOLO.
Int J Agric & Biol Eng, 2025; 18(1): 260–266.

Keywords


citrus, picking robot, Dense-TRU-YOLO, Denseblock, UNet++-FPN

Full Text:

PDF

References


Ross J, Davis V, Foste C, Ray T. Agricultural Statistics. 2020. Available: http://www.nass.usda.gov. Accessed on [2023-05-14].

Guo J, Gao Z, Xia J, Ritenour M A, Li G, Shan Y. Comparative analysis of chemical composition, antimicrobial and antioxidant activity of citrus essential oils from the main cultivated varieties in China. Lebensmittel-Wissenschaft & Technologie, 2018; 97: 825–839.

Gonzalez-de-Santos P, Fernández R, Sepúlveda D, Navas E, Emmi L, Armada M. Field robots for intelligent farms - Inhering features from industry. Agronomy, 2020; 10(11): 1638.

Mehta S S, MacKunis W, Burks T F. Robust visual servo control in the presence of fruit motion for robotic citrus harvesting. Computers and Electronics in Agriculture, 2016; 123: 362–375.

Mehta S S, Burks T F. Vision-based control of robotic manipulator for citrus harvesting. Computers and Electronics in Agriculture, 2014; 102: 146–158.

Gan H, Lee W S, Alchanatis V, Ehsani R, Schueller J K. Immature green citrus fruit detection using color and thermal images. Computers and Electronics in Agriculture, 2018; 152: 117–125.

Lu J, Hu X W. Detecting green citrus fruit on trees in low light and complex background based on MSER and HCA. Transactions of the CSAE, 2017; 33(19): 196–201. (in Chinese)

Zhao C Y, Lee W S, He D J. Immature green citrus detection based on colour feature and sum of absolute transformed difference (SATD) using colour images in the citrus grove. Computers and Electronics in Agriculture, 2016; 124: 243–253.

Redmon J, Divvala S, Girshick R, Farhadi A. You Only Look Once: Unified, Real-Time Object Detection. In: IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016; pp.779–788. doi: 10.1109/CVPR.2016.91.

Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C Y, et al. SSD: Single shot multibox detector. In: Computer Vision - ECCV 2016. Lecture Notes in Computer Science, Springer, 2016; pp.21–37. doi: 10.1007/978-3-319-46448-0_2.

Girshick R. Fast R-CNN. In: IEEE International Conference on Computer Vision, Santiago, 2015; pp.1440-1448. doi: 10.1109/ICCV.2015.169.

Ren S Q, He K M, Girshick R, Sun J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017; 39(6): 1137–1149.

He K M, Gkioxari G, Dollar P, Girshick R. Mask R-CNN. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020; 42(2): 386–397.

Liu G X, Nouaze J C, Touko Mbouembe P L, Kim J H. YOLO-tomato: A robust algorithm for tomato detection based on YOLOv3. Sensors, 2020; 20(7): 2145.

Yang C H, Xiong L Y, Wang Z, Wang Y, Shi G, Kuremot T, et al. Integrated detection of citrus fruits and branches using a convolutional neural network. Computers and Electronics in Agriculture, 2020; 174: 105469.

Zheng T X, Jiang M Z, Li Y F, Feng M C. Research on tomato detection in natural environment based on RC-YOLOv4. Computers and Electronics in Agriculture, 2022; 198: 107029.

Jocher G, Stoken A, Borovec J, NanoCode012, Stan C, Liu C Y, et al. ultralytics/yolov5: v3.1 - Bug fixes and performance improvements. 2020. Available: https://zenodo.org/records/4154370. Accessed on [2023-06-21].

Yan B, Fan P, Lei X Y, Liu Z J, Yang F Z. A real-time apple targets detection method for picking robot based on improved YOLOv5. Remote Sensing, 2021; 13(9): 1619.

Ronneberger O, Fischer P, Brox T. U-Net: Convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015. MICCAI 2015, Springer, 2015; pp.234–241. doi: 10.1007/978-3-319-24574-4_28.

Huang G, Liu Z, Van Der Maaten L, Weinberger K Q. Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu: IEEE, 2017; pp.2261–2269. doi: 10.1109/CVPR.2017.243.

Srinivas A, Lin T Y, Parmar N, Shlens J, Abbeel P, Vaswani A. Bottleneck transformers for visual recognition. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville: IEEE, 2021; pp.16514–16524. doi: 10.1109/CVPR46437.2021.01625.

Han G J, He M, Gao M Z, Yu J Y, Liu K P, Qin L. Insulator breakage detection based on improved YOLOv5. Sustainability, 2022; 14(10): 6066.

Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, et al. Attention is all you need. In: NIPS’17: Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, 2017; 30(4): 6000–6010. doi: 10.5555/3295222.3295349.

Devlin J, Chang M W, Lee K, Toutanova K. BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North, Minneapolis, Minnesota, 2019; 1: 1423.doi: 10.18653/v1/n19-1423.

Brown T B, Mann B, Ryder N, Subbiah M, Kaplan J, Dhariwal P, et al. Language models are few-shot learners. arXiv: Computation and Language, 2005; In Press. doi: 10.48550/arXiv.2005.14165.

Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X H, Unterthiner T, et al. An image is worth 16×16 words: Transformers for image recognition at scale. arXiv:2010.11929, 2020; doi: 10.48550/arXiv.2010.11929.

Zhang Z X, Lu X Q, Cao G J, Yang Y T, Jiao L C, Liu F. ViT-YOLO: Transformer-based YOLO for object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal: IEEE, 2021; pp.2799–2808. doi: 10.1109/ICCVW54120.2021.00314.

Liu S, Qi L, Qin H F, Shi J P, Jia J Y. Path aggregation network for instance segmentation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018; pp.8759–8768. doi: 10.1109/CVPR.2018.00913.

Lin T Y, Dollar P, Girshick R, He K M, Hariharan B, Belongie S. Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 2017; pp.936–944. doi: 10.1109/CVPR.2017.106.

Zhou Z W, Siddiquee M M R, Tajbakhsh N, Liang J M. UNet++: Redesigning skip connections to exploit multiscale features in image segmentation. IEEE Transactions on Medical Imaging, 2020; 39(6): 1856–1867.




Copyright (c) 2025 International Journal of Agricultural and Biological Engineering

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

2023-2026 Copyright IJABE Editing and Publishing Office