Why TPDNet
The growth of the global population has increased the demand for fruits and vegetables, while high harvesting labor costs (accounting for 30\% to 50\% of total costs) severely constrain industry development. Currently, relevant personnel primarily utilize 2D object detection technology to achieve automated harvesting, aiming to reduce labor costs. However, 2D detection technology is limited to providing planar information and cannot meet the requirements of scenarios that need 3D spatial data, whereas 3D object detection technology can effectively address these needs, including point cloud-based methods and monocular-based methods. Since point cloud-based object detection methods require expensive equipment, they are not suitable for low-cost agricultural harvesting scenarios. In contrast, monocular 3D object detection methods have the advantage of only requiring a camera and being easy to deploy. However, there is a lack of specialized monocular 3D object detection datasets and algorithms suited for natural scenes in the agricultural field, which limits the application and development of this technology in agricultural automation. To address this, we construct a 3D object detection dataset for wax gourds and propose a network called TPDNet, which aims to capture the 3D information of objects from a single RGB image for fruits and vegetables in fields.