Constrained 6-DoF Grasp Generation on Complex Shapes for Improved Dual-Arm Manipulation

1IIIT Hyderabad, 2Massachusetts Institute of Technology, 3Brown University
Equal contribution

Abstract

Efficiently generating grasp poses tailored to specific regions of an object is vital for various robotic manipulation tasks, especially in a dual-arm setup. This scenario presents a significant challenge due to the complex geometries involved, requiring a deep understanding of the local geometry to generate grasps efficiently on the specified constrained regions. Existing methods only explore settings involving tabletop/small objects and require augmented datasets to train, limiting their performance on complex objects. We propose CGDF: Constrained Grasp Diffusion Fields, a diffusion based grasp generative model that generalizes to objects with arbitrary geometries, as well as generates dense grasps on the target regions. CGDF uses a part-guided diffusion approach that enables it to get high sample efficiency in constrained grasping without explicitly training on massive constraint augmented datasets.We provide qualitative and quantitative comparisons using analytical metrics and in simulation, in both unconstrained and constrained settings to show that our method can generalize to generate stable grasps on complex objects, especially useful for dual-arm manipulation settings, while existing methods struggle to do so.

Architecture

Architectural Overview:The figure section (a) shows the architecture of our proposed energy-based model \(\mathbf{E_{\theta}} \). During this process, the model takes as input a point cloud and the grasp pose, which is subsequently converted into a set of query points. We use a VN-Pointnet-based point cloud encoder, which generates per-point features. (b) shows how these features are then distilled into three 2D feature planes oriented along the \(XY\), \(XZ\), and \(YZ\) planes using a convolutional multi-plane encoder. For each grasp pose, feature vectors corresponding to \(N\) query points are obtained using bilinear interpolation on the feature planes. Subsequently, the grasp feature vector is derived from \(\mathbf{F_{\theta}} \) and decoded into an energy value by \(\mathbf{D_{\theta}} \) . In figure section (c), we show the grasp diffusion process, where grasps are diffused over the object during the forward diffusion process and denoised using the backward diffusion process.

Part Guided Diffusion

Part-guided diffusion works by utilizing the energy values \(e^{'} \) and \(e^{''} \) predicted by the trained EBM \(\mathbf{E_{\theta}} \) on \(P \) (pointcloud of the full object) and \(P_{t} \) (target region pointcloud). \(e^{'} \) is calculated as \(\mathbf{E_{\theta}(H_{k}, 0, P)} \) and \(e^{''} \) as \(\mathbf{E_{\theta}(H_{k}, 0, P_{t})} \) where \(H_{k} \) is the grasp pose with noise scale \(k\). By taking the maximum value of the energies, this strategy guides the grasp from a random pose to a stable configuration near the target region. During the inverse diffusion step, if the grasp \(H_{k} \) is near the constrained region but collides with the full object or is unstable, the energy \(e_{k}^{'} \) is higher, which makes \(H_{k} \) move to a more stable pose. Conversely, if the grasp is stable but is far from the constrained region, then \(e_{k}^{'} \) is higher, and the grasp moves closer to the constrained region. Eventually, the grasp moves to a pose where the energies for both \(P \) and \(P_{t} \) are low, i.e. the grasp is on the constrained region as well as stable.

Results

Ours (Constrained)

VCGS (Constrained)


Ours (Unconstrained)

VCGS (Unconstrained)

SE3Diff (Unconstrained)

BibTeX

@misc{cgdf2024,
    title={Constrained 6-DoF Grasp Generation on Complex Shapes for Improved Dual-Arm Manipulation}
    author={Gaurav Singh and Sanket Kalwar and Md Faizal Karim and Bipasha Sen and Nagamanikandan Govindan and Srinath Sridhar and K Madhava Krishna},
    year={2024},
    journal={}
    
}